I am trying to compare two data frames. So I am reading two CSV files using pandas:
import pandas as pd
import numpy as np
dfYelp = pd.read_csv('Desktop/yelp.csv')
dfYelp
review_date address_zipcode review_text review_rating price NYC_reviewer
0 2015-10-13 11432.0 Classic Urban Deli.. they have the "hungry man... 5.0 star rating $ 0.0
1 2015-02-24 10033.0 Meh it's a deli, they have soda and milk and y... 2.0 star rating $ 0.0
2 2014-05-13 10033.0 This is the newest deli on the block, and I th... 4.0 star rating $ 0.0
3 2015-06-13 11234.0 Absolutely the best shrimp with broccoli in ga... 5.0 star rating UNCLAIMED RESTAURANT 1.0
4 2015-05-12 11234.0 This is a rare gem in the area. Not too often ... 4.0 star rating UNCLAIMED RESTAURANT 0.0
... ... ... ... ... ... ...
18650 2009-03-06 10031.0 I was there last night march 5th, Thursday. S... 1.0 star rating $$$ 1.0
18651 2009-02-14 10031.0 someone please tell me exactly where i can dan... 3.0 star rating $$$ 1.0
18652 2009-02-16 10031.0 great club n great restaurant! my club review ... 5.0 star rating $$$ 1.0
18653 2009-02-22 10031.0 It's small and crowded. Too homogenous in term... 2.0 star rating $$$ 1.0
18654 2009-09-18 10031.0 I've never been, but judging from the reviews ... 1.0 star rating $$$ 1.0
18655 rows × 6 columns
dfPanel = pd.read_csv('Desktop/panel.csv')
dfPanel
zipcode year airbnb
0 10026.0 2009 0
1 10026.0 2010 0
2 10026.0 2011 0
3 10026.0 2012 1
4 10026.0 2013 1
... ... ... ...
65 11432.0 2011 0
66 11432.0 2012 0
67 11432.0 2013 0
68 11432.0 2014 0
69 11432.0 2015 0
70 rows × 3 columns
From these two data frames I need to add a column to "dfPanel" that counts the number of reviews in "dfYelp" based on the year and zipcode. In order to do this I first clean up the format to just have year in "dfYelp":
#Reformatting zipcodes for panel data
YelpYear = pd.read_csv('Desktop/yelp.csv', dtype={'review_date': str})
YelpYear = YelpYear['review_date'].str.slice(0,4)
dfYelp['review_date'] = YelpYear
Now, I want to create a new column that counts the number of reviews in "dfYelp" based on the year and zipcode in the lines of "dfPanel"
I tried:
yelp_count = 0
if YelpYear == dfPanel['year'] and dfYelp['address_zipcode']==dfPanel['zipcode']:
yelp_count += 1
I'm getting this error: ValueError: Can only compare identically-labeled Series objects