I am a bit confused why my code is not working (see at the bottom). I have a dataset from the World Bank with Malaria cases worldwide. This is a simplified version of what data looks like this:
Location Period Value
1 Philippines 2000 400
2 Philippines 2001 380
3. Philippines 2002 300
...
I need to do a diff() for each location (i.e. country). I cannot do df.diff() to the whole dataset because all countries are in the same column and will mix up the results. So I need to momentarily separate each country as a dataframe, then do philippines_df.diff(), then save the results into a new column in the original dataframe in the right position.
This would be a desired result:
Location Period Value Difference
1 Philippines 2000 400 np.NaN
2 Philippines 2001 380 20
3. Philippines 2002 300 80
...
This is the code I tried:
african_countries = malaria_africa.Location.unique().tolist()
for african_country in african_countries:
malaria_africa[malaria_africa.Location == african_country]['Difference'] = malaria_africa[malaria_africa.Location == african_country].Value.diff()
But this are the results I get (all rows are NaN):
Location Period Value Difference
1 Philippines 2000 400 np.NaN
2 Philippines 2001 380 np.NaN
3. Philippines 2002 300 np.NaN
...