0

I have a list that contains a list of target columns:

cols = ["col1", "col2", "col4"]

Then I have several pandas DataFrames with a different number of columns. I must select columns from cols. If one of the columns from cols does not exist in a DataFrame, then NaN values should be generated.

df1 =
col1  col3
1     x1
2     x2
3     x3

df2 =
col1  col2  col4
1     f1    car3
3     f2    car2
4     f5    car1

For example, df2[cols] works well, but df1[cols] obvioulsy fails. I need the following output for df1

df1 =
col1  col2  col3
1     NaN   NaN
2     NaN   NaN
3     NaN   NaN
Sociopath
  • 12,395
  • 17
  • 43
  • 69
Tatik
  • 827
  • 1
  • 6
  • 16
  • 1
    Possible duplicate of [How to add an empty column to a dataframe?](https://stackoverflow.com/questions/16327055/how-to-add-an-empty-column-to-a-dataframe) – Georgy Apr 05 '19 at 12:50

1 Answers1

2

Use DataFrame.reindex with list of columns, if no matched are added NaNs columns:

df1 = df1.reindex(cols, axis=1)
print (df1)
   col1  col2  col4
0     1   NaN   NaN
1     2   NaN   NaN
2     3   NaN   NaN

So for df2 are returned same columns:

df2 = df2.reindex(cols, axis=1)
print (df2)
   col1 col2  col4
0     1   f1  car3
1     3   f2  car2
2     4   f5  car1
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090