Add multiple empty columns to pandas DataFrame

Question

How do I add multiple empty columns to a DataFrame from a list?

I can do:

    df["B"] = None
    df["C"] = None
    df["D"] = None

But I can't do:

    df[["B", "C", "D"]] = None

KeyError: "['B' 'C' 'D'] not in index"

`None` is different to 0, but some answers are assuming it's equivalent. Also, assigning `None` will give a dtype of object, but assigning 0 will give a dtype of int. — smci, Apr 19 '20 at 11:11
Also you can't do `df[['B','C','D']] = None, None, None` or `[None, None, None]` or `pd.DataFrame([None, None, None])` — smci, Apr 19 '20 at 11:13
Related : the more general [How to add multiple columns to pandas dataframe in one assignment?](https://stackoverflow.com/questions/39050539/how-to-add-multiple-columns-to-pandas-dataframe-in-one-assignment) — smci, Apr 19 '20 at 11:16

score 104 · Answer 1 · edited Sep 12 '17 at 13:48

104

You could use df.reindex to add new columns:

In [18]: df = pd.DataFrame(np.random.randint(10, size=(5,1)), columns=['A'])

In [19]: df
Out[19]: 
   A
0  4
1  7
2  0
3  7
4  6

In [20]: df.reindex(columns=list('ABCD'))
Out[20]: 
   A   B   C   D
0  4 NaN NaN NaN
1  7 NaN NaN NaN
2  0 NaN NaN NaN
3  7 NaN NaN NaN
4  6 NaN NaN NaN

reindex will return a new DataFrame, with columns appearing in the order they are listed:

In [31]: df.reindex(columns=list('DCBA'))
Out[31]: 
    D   C   B  A
0 NaN NaN NaN  4
1 NaN NaN NaN  7
2 NaN NaN NaN  0
3 NaN NaN NaN  7
4 NaN NaN NaN  6

The reindex method as a fill_value parameter as well:

In [22]: df.reindex(columns=list('ABCD'), fill_value=0)
Out[22]: 
   A  B  C  D
0  4  0  0  0
1  7  0  0  0
2  0  0  0  0
3  7  0  0  0
4  6  0  0  0

edited Sep 12 '17 at 13:48

Dror

11,068
19
81
145

answered Jun 19 '15 at 17:00

unutbu

777,569
165
1,697
1,613

6

After experimenting with a moderately large Data Frame (~2.5k rows for 80k columns), and this solution appears to be orders of magnitude faster than the accepted one.BTW is there a reason why this specific command does not accept an "inplace=True" parameter? df = df.reindex(...) appears to use up quite a bit of RAM. – Marco Spinaci Sep 14 '17 at 15:05
6

@MarcoSpinaci: I recommend never using `inplace=True`. It doesn't do what most people think it does. Under the hood, an entirely new DataFrame is always created, and then the data from the new DataFrame is copied into the original DataFrame. That doesn't save any memory. So `inplace=True` is window-dressing without substance, and moreover, is misleadingly named. I haven't checked the code, but I expect `df = df.reindex(...)` requires at least 2x the memory required for `df`, and of course more when `reindex` is used to expand the number of rows. – unutbu Sep 14 '17 at 16:06
@unutbu, nevertheless, it is useful when you are iterating containers, e.g. a list or a dictionary, it would avoid the use of indexes that makes the code a bit more dirty... – toto_tico Mar 27 '18 at 12:36
@unutbu it is indeed a lot faster when i profiled my ~200 columns creation code, could you briefly explain why doing reindex is much faster than concat or simply setting multiple columns to a numpy array? – Sam Oct 16 '20 at 19:13

score 85 · Accepted Answer · edited Dec 19 '18 at 23:08

85

I'd concat using a DataFrame:

In [23]:
df = pd.DataFrame(columns=['A'])
df

Out[23]:
Empty DataFrame
Columns: [A]
Index: []

In [24]:    
pd.concat([df,pd.DataFrame(columns=list('BCD'))])

Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []

So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.

Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex may be preferable where performance is critical.

edited Dec 19 '18 at 23:08

m_floer

2,110
4
28
48

answered Jun 18 '15 at 22:13

EdChum

339,461
188
752
538

Thanks, it's possible that I'm missing something, but I added `pd.concat([df,pd.DataFrame(columns=list('BCD'))])` – it does nothing afaik. Could it be due to that I use `df = pd.read_csv` and not `df = pd.DataFrame`? – P A N Jun 18 '15 at 22:33
2

You need to assign the result of the concat so `df=pd.concat([df,pd.DataFrame(columns=list('BCD'))])` – EdChum Jun 18 '15 at 22:34
Thanks, that worked. Can I append the columns to the last column? The new columns are added to the beginning. It seems like concat is doing automatic reordering because my original columns are moved around as well. – P A N Jun 18 '15 at 23:02
1

That shouldn't happen, you can change the column order either using fancy indexing: `df.ix[:, col_list]` or by just selecting them and assigning them back to the original df: `df = df[col_list]` – EdChum Jun 19 '15 at 08:13
1

This is not working anymore (using pandas 0.19.1). The concatenation results in a `TypeError: data type not understood`. – thenaturalist Jan 13 '17 at 14:07
1

@thenaturalist sorry this still works for me in pandas `0.19.1` you'll need to post full code that I can run – EdChum Jan 13 '17 at 15:04

toto_tico · Answer 3 · 2017-12-05T09:30:05.727

43

If you don't want to rewrite the name of the old columns, then you can use reindex:

df.reindex(columns=[*df.columns.tolist(), 'new_column1', 'new_column2'], fill_value=0)

Full example:

In [1]: df = pd.DataFrame(np.random.randint(10, size=(3,1)), columns=['A'])

In [1]: df
Out[1]: 
   A
0  4
1  7
2  0

In [2]: df.reindex(columns=[*df.columns.tolist(), 'col1', 'col2'], fill_value=0)
Out[2]: 

   A  col1  col2
0  1     0     0
1  2     0     0

And, if you already have a list with the column names, :

In [3]: my_cols_list=['col1','col2']

In [4]: df.reindex(columns=[*df.columns.tolist(), *my_cols_list], fill_value=0)
Out[4]: 
   A  col1  col2
0  1     0     0
1  2     0     0

edited Dec 05 '17 at 09:30

answered Jul 06 '17 at 14:11

toto_tico

16,063
8
90
101

2

Thanks. Could you tell me what the `*` does in the `reindex` input please? – Bowen Liu Oct 26 '18 at 20:19
2

It unpacks the list into positional arguments, it is [a Python operator](https://stackoverflow.com/questions/2921847/what-does-the-star-operator-mean#2921893) – toto_tico Oct 26 '18 at 20:57
Nice solution. BTW, the call to `tolist()` is not necessary. – BrunoF Jun 25 '21 at 21:23

score 10 · Answer 4 · answered Jul 15 '20 at 15:33

10

Summary of alternative solutions:

columns_add = ['a', 'b', 'c']

for loop:

for newcol in columns_add:
    df[newcol]= None

dict method:

df.assign(**dict([(_,None) for _ in columns_add]))

tuple assignment:

df['a'], df['b'], df['c'] = None, None, None

answered Jul 15 '20 at 15:33

yosemite_k

2,786
1
14
26

1

`df.assign(**dict.fromkeys(columns_add, None))` should also work – Joe Ferndz Dec 28 '20 at 09:52

alexprice · Answer 5 · 2020-06-06T15:50:36.820

8

Why not just use loop:

for newcol in ['B','C','D']:
    df[newcol]=np.nan

edited Jun 06 '20 at 15:50

answered May 04 '19 at 17:04

alexprice

364
4
11

0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't. – smci Apr 19 '20 at 11:03

Mykola Zotko · Answer 6 · 2021-09-09T07:40:59.177

3

You can make use of Pandas broadcasting:

df = pd.DataFrame({'A': [1, 1, 1]})

df[['B', 'C']] = 2, 3
# df[['B', 'C']] = [2, 3]

Result:

To add empty columns:

df[['B', 'C', 'D']] = 3 * [np.nan]

Result:

   A   B   C   D
0  1 NaN NaN NaN
1  1 NaN NaN NaN
2  1 NaN NaN NaN

edited Sep 09 '21 at 07:40

answered Sep 09 '21 at 07:12

Mykola Zotko

12,250
2
39
53

score 2 · Answer 7 · edited Jun 22 '20 at 03:14

2

I'd use

df["B"], df["C"], df["D"] = None, None, None

or

df["B"], df["C"], df["D"] = ["None" for a in range(3)]

edited Jun 22 '20 at 03:14

jizhihaoSAMA

11,804
9
23
43

answered Jun 22 '20 at 02:38

lumiere_profues

21
1

score 1 · Answer 8 · answered Nov 20 '19 at 12:26

1

Just to add to the list of funny ways:

columns_add = ['a', 'b', 'c']
df = df.assign(**dict(zip(columns_add, [0] * len(columns_add)))

answered Nov 20 '19 at 12:26

Oleg O

955
5
10

0 is not the same value as None. Also, it'll force the dtype to integer, whereas None won't. – smci Apr 19 '20 at 11:03
Anyway you're missing a trailing fourth close-parenthesis. – smci Apr 19 '20 at 11:04

Add multiple empty columns to pandas DataFrame

8 Answers8

Linked

Related