0

I have a dataset

id  year      sales
1    2000      10
2    2000      10
2    2001      20
2    2002      30

I want to create a balanced panel so that I get the following:

id  year      sales
1    2000      10
1    2001      NA
1    2002      NA
2    2000      10
2    2001      20
2    2002      30

I tried the the following code:

df_balanced = (df.set_index('year',append=True).reindex(pd.MultiIndex.from_product([df.index.unique(),range(df.year.min(),df.year.max()+1)],names['id','year'])).reset_index(level=1))

But I am not getting the desired output.

Jui Sen
  • 365
  • 3
  • 10
  • Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Show where the intermediate results differ from what you expected. – Prune May 24 '21 at 21:50
  • Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune May 24 '21 at 21:50

1 Answers1

1
idx = pd.MultiIndex.from_product(
    [df.id.unique(), df.year.unique()], names=["id", "year"]
)
df = df.set_index(["id", "year"]).reindex(idx).reset_index()
print(df)

Prints:

   id  year  sales
0   1  2000   10.0
1   1  2001    NaN
2   1  2002    NaN
3   2  2000   10.0
4   2  2001   20.0
5   2  2002   30.0
Andrej Kesely
  • 118,151
  • 13
  • 38
  • 75