0

I am trying to compare two data frames. So I am reading two CSV files using pandas:

import pandas as pd
import numpy as np

dfYelp = pd.read_csv('Desktop/yelp.csv')
dfYelp
review_date address_zipcode review_text review_rating   price   NYC_reviewer
0   2015-10-13  11432.0 Classic Urban Deli.. they have the "hungry man...   5.0 star rating $   0.0
1   2015-02-24  10033.0 Meh it's a deli, they have soda and milk and y...   2.0 star rating $   0.0
2   2014-05-13  10033.0 This is the newest deli on the block, and I th...   4.0 star rating $   0.0
3   2015-06-13  11234.0 Absolutely the best shrimp with broccoli in ga...   5.0 star rating UNCLAIMED RESTAURANT    1.0
4   2015-05-12  11234.0 This is a rare gem in the area. Not too often ...   4.0 star rating UNCLAIMED RESTAURANT    0.0
... ... ... ... ... ... ...
18650   2009-03-06  10031.0 I was there last night march 5th, Thursday.  S...   1.0 star rating $$$ 1.0
18651   2009-02-14  10031.0 someone please tell me exactly where i can dan...   3.0 star rating $$$ 1.0
18652   2009-02-16  10031.0 great club n great restaurant! my club review ...   5.0 star rating $$$ 1.0
18653   2009-02-22  10031.0 It's small and crowded. Too homogenous in term...   2.0 star rating $$$ 1.0
18654   2009-09-18  10031.0 I've never been, but judging from the reviews ...   1.0 star rating $$$ 1.0
18655 rows × 6 columns

dfPanel = pd.read_csv('Desktop/panel.csv')
dfPanel

zipcode year    airbnb
0   10026.0 2009    0
1   10026.0 2010    0
2   10026.0 2011    0
3   10026.0 2012    1
4   10026.0 2013    1
... ... ... ...
65  11432.0 2011    0
66  11432.0 2012    0
67  11432.0 2013    0
68  11432.0 2014    0
69  11432.0 2015    0
70 rows × 3 columns

From these two data frames I need to add a column to "dfPanel" that counts the number of reviews in "dfYelp" based on the year and zipcode. In order to do this I first clean up the format to just have year in "dfYelp":

#Reformatting zipcodes for panel data
YelpYear = pd.read_csv('Desktop/yelp.csv', dtype={'review_date': str})

YelpYear = YelpYear['review_date'].str.slice(0,4)

dfYelp['review_date'] = YelpYear

Now, I want to create a new column that counts the number of reviews in "dfYelp" based on the year and zipcode in the lines of "dfPanel"

I tried:

yelp_count = 0

if YelpYear == dfPanel['year'] and dfYelp['address_zipcode']==dfPanel['zipcode']:
    yelp_count += 1

I'm getting this error: ValueError: Can only compare identically-labeled Series objects

yuripao71
  • 55
  • 4
  • I believe you are trying to compare Series of different lengths. See: https://stackoverflow.com/a/65215976/11506959 – Sena Yevenyo Apr 16 '22 at 22:39

0 Answers0