0

I know there are a lot of similar questions already, but none seem to apply to my problem (or I don't know how to apply them)

So I have a Pandas DataFrame with duplicated entries in the first column, but different values in the rest of the columns.

Like so:

     location      year     GDP     ...
0    AUT           1998     14...
1    AUT           2018     98...
2    AUS           1998     24...
3    AUS           2018     83...
...

I would like to get only unique entries in 'location' but keep all the columns and their values in a list:

     location      year             GDP               ...
0    AUT           (1998, 2018)     (14..., 98...)
1    AUS           (1998, 2018)     (24..., 83...)
...

I tried:

grouped = df_gdp_pop_years.groupby("Location")['Year'].apply(list).reset_index(name = "Year")

and I tried to do the same with a lambda function, but I always end up with the same problem: I only keep the years.

How can I get the results I want?

Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
Jana
  • 27
  • 3

2 Answers2

2

You can try something like

df_gdp_pop_years.groupby("Location").agg({"Year": list, "GDP": list})
rchome
  • 2,313
  • 5
  • 18
0

If your other columns may be changing or there may be more added, you can accomplish this with a generalized .agg() on those columns:

df_gdp_pop_years.groupby('location').agg(lambda x: x.tolist())

I found this by searching for 'opposite of pandas explode' which led me to a different SO question: How to implode(reverse of pandas explode) based on a column

scotscotmcc
  • 1,318
  • 1
  • 3
  • 16