101

How do I add multiple empty columns to a DataFrame from a list?

I can do:

    df["B"] = None
    df["C"] = None
    df["D"] = None

But I can't do:

    df[["B", "C", "D"]] = None

KeyError: "['B' 'C' 'D'] not in index"

vsminkov
  • 10,252
  • 2
  • 36
  • 49
P A N
  • 5,010
  • 13
  • 46
  • 89
  • `None` is different to 0, but some answers are assuming it's equivalent. Also, assigning `None` will give a dtype of object, but assigning 0 will give a dtype of int. – smci Apr 19 '20 at 11:11
  • Also you can't do `df[['B','C','D']] = None, None, None` or `[None, None, None]` or `pd.DataFrame([None, None, None])` – smci Apr 19 '20 at 11:13
  • Related : the more general [How to add multiple columns to pandas dataframe in one assignment?](https://stackoverflow.com/questions/39050539/how-to-add-multiple-columns-to-pandas-dataframe-in-one-assignment) – smci Apr 19 '20 at 11:16

8 Answers8

104

You could use df.reindex to add new columns:

In [18]: df = pd.DataFrame(np.random.randint(10, size=(5,1)), columns=['A'])

In [19]: df
Out[19]: 
   A
0  4
1  7
2  0
3  7
4  6

In [20]: df.reindex(columns=list('ABCD'))
Out[20]: 
   A   B   C   D
0  4 NaN NaN NaN
1  7 NaN NaN NaN
2  0 NaN NaN NaN
3  7 NaN NaN NaN
4  6 NaN NaN NaN

reindex will return a new DataFrame, with columns appearing in the order they are listed:

In [31]: df.reindex(columns=list('DCBA'))
Out[31]: 
    D   C   B  A
0 NaN NaN NaN  4
1 NaN NaN NaN  7
2 NaN NaN NaN  0
3 NaN NaN NaN  7
4 NaN NaN NaN  6

The reindex method as a fill_value parameter as well:

In [22]: df.reindex(columns=list('ABCD'), fill_value=0)
Out[22]: 
   A  B  C  D
0  4  0  0  0
1  7  0  0  0
2  0  0  0  0
3  7  0  0  0
4  6  0  0  0
Dror
  • 11,068
  • 19
  • 81
  • 145
unutbu
  • 777,569
  • 165
  • 1,697
  • 1,613
  • 6
    After experimenting with a moderately large Data Frame (~2.5k rows for 80k columns), and this solution appears to be orders of magnitude faster than the accepted one.BTW is there a reason why this specific command does not accept an "inplace=True" parameter? df = df.reindex(...) appears to use up quite a bit of RAM. – Marco Spinaci Sep 14 '17 at 15:05
  • 6
    @MarcoSpinaci: I recommend never using `inplace=True`. It doesn't do what most people think it does. Under the hood, an entirely new DataFrame is always created, and then the data from the new DataFrame is copied into the original DataFrame. That doesn't save any memory. So `inplace=True` is window-dressing without substance, and moreover, is misleadingly named. I haven't checked the code, but I expect `df = df.reindex(...)` requires at least 2x the memory required for `df`, and of course more when `reindex` is used to expand the number of rows. – unutbu Sep 14 '17 at 16:06
  • @unutbu, nevertheless, it is useful when you are iterating containers, e.g. a list or a dictionary, it would avoid the use of indexes that makes the code a bit more dirty... – toto_tico Mar 27 '18 at 12:36
  • @unutbu it is indeed a lot faster when i profiled my ~200 columns creation code, could you briefly explain why doing reindex is much faster than concat or simply setting multiple columns to a numpy array? – Sam Oct 16 '20 at 19:13
85

I'd concat using a DataFrame:

In [23]:
df = pd.DataFrame(columns=['A'])
df

Out[23]:
Empty DataFrame
Columns: [A]
Index: []

In [24]:    
pd.concat([df,pd.DataFrame(columns=list('BCD'))])

Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []

So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.


Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex may be preferable where performance is critical.

m_floer
  • 2,110
  • 4
  • 28
  • 48
EdChum
  • 339,461
  • 188
  • 752
  • 538
  • Thanks, it's possible that I'm missing something, but I added `pd.concat([df,pd.DataFrame(columns=list('BCD'))])` – it does nothing afaik. Could it be due to that I use `df = pd.read_csv` and not `df = pd.DataFrame`? – P A N Jun 18 '15 at 22:33
  • 2
    You need to assign the result of the concat so `df=pd.concat([df,pd.DataFrame(columns=list('BCD'))])` – EdChum Jun 18 '15 at 22:34
  • Thanks, that worked. Can I append the columns to the last column? The new columns are added to the beginning. It seems like concat is doing automatic reordering because my original columns are moved around as well. – P A N Jun 18 '15 at 23:02
  • 1
    That shouldn't happen, you can change the column order either using fancy indexing: `df.ix[:, col_list]` or by just selecting them and assigning them back to the original df: `df = df[col_list]` – EdChum Jun 19 '15 at 08:13
  • 1
    This is not working anymore (using pandas 0.19.1). The concatenation results in a `TypeError: data type not understood`. – thenaturalist Jan 13 '17 at 14:07
  • 1
    @thenaturalist sorry this still works for me in pandas `0.19.1` you'll need to post full code that I can run – EdChum Jan 13 '17 at 15:04
43

If you don't want to rewrite the name of the old columns, then you can use reindex:

df.reindex(columns=[*df.columns.tolist(), 'new_column1', 'new_column2'], fill_value=0)

Full example:

In [1]: df = pd.DataFrame(np.random.randint(10, size=(3,1)), columns=['A'])

In [1]: df
Out[1]: 
   A
0  4
1  7
2  0

In [2]: df.reindex(columns=[*df.columns.tolist(), 'col1', 'col2'], fill_value=0)
Out[2]: 

   A  col1  col2
0  1     0     0
1  2     0     0

And, if you already have a list with the column names, :

In [3]: my_cols_list=['col1','col2']

In [4]: df.reindex(columns=[*df.columns.tolist(), *my_cols_list], fill_value=0)
Out[4]: 
   A  col1  col2
0  1     0     0
1  2     0     0
toto_tico
  • 16,063
  • 8
  • 90
  • 101
10

Summary of alternative solutions:

columns_add = ['a', 'b', 'c']
  1. for loop:

    for newcol in columns_add:
        df[newcol]= None
    
  2. dict method:

    df.assign(**dict([(_,None) for _ in columns_add]))
    
  3. tuple assignment:

    df['a'], df['b'], df['c'] = None, None, None
    
yosemite_k
  • 2,786
  • 1
  • 14
  • 26
8

Why not just use loop:

for newcol in ['B','C','D']:
    df[newcol]=np.nan
alexprice
  • 364
  • 4
  • 11
  • 0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't. – smci Apr 19 '20 at 11:03
3

You can make use of Pandas broadcasting:

df = pd.DataFrame({'A': [1, 1, 1]})

df[['B', 'C']] = 2, 3
# df[['B', 'C']] = [2, 3]

Result:

   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3

To add empty columns:

df[['B', 'C', 'D']] = 3 * [np.nan]

Result:

   A   B   C   D
0  1 NaN NaN NaN
1  1 NaN NaN NaN
2  1 NaN NaN NaN
Mykola Zotko
  • 12,250
  • 2
  • 39
  • 53
2

I'd use

df["B"], df["C"], df["D"] = None, None, None

or

df["B"], df["C"], df["D"] = ["None" for a in range(3)]
jizhihaoSAMA
  • 11,804
  • 9
  • 23
  • 43
1

Just to add to the list of funny ways:

columns_add = ['a', 'b', 'c']
df = df.assign(**dict(zip(columns_add, [0] * len(columns_add)))
Oleg O
  • 955
  • 5
  • 10