How to diff() time-series data with mixed values

Question

I am a bit confused why my code is not working (see at the bottom). I have a dataset from the World Bank with Malaria cases worldwide. This is a simplified version of what data looks like this:

    Location    Period  Value
1   Philippines 2000    400
2   Philippines 2001    380
3.  Philippines 2002    300
...

I need to do a diff() for each location (i.e. country). I cannot do df.diff() to the whole dataset because all countries are in the same column and will mix up the results. So I need to momentarily separate each country as a dataframe, then do philippines_df.diff(), then save the results into a new column in the original dataframe in the right position.

This would be a desired result:

    Location    Period  Value  Difference
1   Philippines 2000    400    np.NaN
2   Philippines 2001    380    20
3.  Philippines 2002    300    80
...

This is the code I tried:

african_countries = malaria_africa.Location.unique().tolist()

for african_country in african_countries:
    malaria_africa[malaria_africa.Location == african_country]['Difference'] = malaria_africa[malaria_africa.Location == african_country].Value.diff()

But this are the results I get (all rows are NaN):

    Location    Period  Value  Difference
1   Philippines 2000    400    np.NaN
2   Philippines 2001    380    np.NaN
3.  Philippines 2002    300    np.NaN
...

Use `groupby` + `diff` like [here](https://stackoverflow.com/a/63288916/2901002) — jezrael, Sep 22 '21 at 09:53

How to diff() time-series data with mixed values

0 Answers0