I'm pretty new to machine learning. I have two dataframes that have movie ratings in them. Some of the movie ratings have the same movie title, but different number ratings while other rows have movie titles that the other data frame doesn't have. I was wondering how I would be able to combine the two dataframes and average any ratings that have the same movie name. Thanks for the help!
Asked
Active
Viewed 294 times
0
-
Removed `machine-learning` and `numpy` tag it has nothing to do with the question. And please don't post images of data frame, transcribing images is tedious, instead post `df.to_dict()` to the question. It makes reproducing your data locally very easy. Please go through [How to make good pandas reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Ch3steR Jul 22 '20 at 15:19
1 Answers
0
You can use pd.concat with GroupBy.agg
# df = pd.DataFrame({'Movie':['IR', 'R'], 'rating':[95, 90], 'director':['SB', 'RC']})
# df1 = pd.DataFrame({'Movie':['IR', 'BH'], 'rating':[93, 88], 'direction':['SB', 'RC']})
(pd.concat([df, df1]).groupby('Movie', as_index=False).
agg({'rating':'mean', 'director':'first'}))
Movie rating director
0 BH 88 RC
1 IR 94 SB
2 R 90 RC
Or df.append
df.append(df1).groupby('Movie',as_index=False).agg({'rating':'mean', 'director':'first'})
Movie rating director
0 BH 88 RC
1 IR 94 SB
2 R 90 RC
- If you want
Moviecolumn as index,as_indexparameter ofdf.groupbydefaults toTrue,Moviecolumn would be index, removeas_index=Falsefromgroupby - If you want to maintain the order then set
sortparameter toTrueingroupby.(df.append(df1).groupby('Movie',as_index=False, sort=False). agg({'rating':'mean', 'director':'first'})) Movie rating director 0 IR 94 SB 1 R 90 RC 2 BH 88 RC
Ch3steR
- 19,076
- 4
- 25
- 52