303

I have a pandas DataFrame with 4 columns and I want to create a new DataFrame that only has three of the columns. This question is similar to: Extracting specific columns from a data frame but for pandas not R. The following code does not work, raises an error, and is certainly not the pandasnic way to do it.

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new = pd.DataFrame(zip(old.A, old.C, old.D)) # raises TypeError: data argument can't be an iterator 

What is the pandasnic way to do it?

cs95
  • 330,695
  • 80
  • 606
  • 657
SpeedCoder5
  • 7,005
  • 5
  • 29
  • 33

9 Answers9

585

There is a way of doing this and it actually looks similar to R

new = old[['A', 'C', 'D']].copy()

Here you are just selecting the columns you want from the original data frame and creating a variable for those. If you want to modify the new dataframe at all you'll probably want to use .copy() to avoid a SettingWithCopyWarning.

An alternative method is to use filter which will create a copy by default:

new = old.filter(['A','B','D'], axis=1)

Finally, depending on the number of columns in your original dataframe, it might be more succinct to express this using a drop (this will also create a copy by default):

new = old.drop('B', axis=1)
maxymoo
  • 32,647
  • 9
  • 86
  • 115
johnchase
  • 11,891
  • 5
  • 33
  • 60
  • 44
    A caution if just copying one column: In `old[['A']].copy()`, the double square brackets are required to create a new data frame. Note that `old['A'].copy()` will only create a Series. – intotecho Feb 01 '19 at 02:18
46

The easiest way is

new = old[['A','C','D']]

.

stidmatt
  • 1,447
  • 11
  • 14
  • 10
    This isn't making a copy unless you explicitly call .copy() – Sylvain Oct 30 '19 at 02:23
  • this copies by default. – Nguai al Feb 05 '20 at 06:49
  • 6
    @Nguaial the behaviour of simple indexing is not specified. You will not know if you get a copy or a view. See documentation for more details: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy – Ole Fass May 05 '20 at 09:38
  • 3
    As mentioned in the comment above, this will create a view and not a copy. – le_llama Jun 22 '21 at 08:57
17

Another simpler way seems to be:

new = pd.DataFrame([old.A, old.B, old.C]).transpose()

where old.column_name will give you a series. Make a list of all the column-series you want to retain and pass it to the DataFrame constructor. We need to do a transpose to adjust the shape.

In [14]:pd.DataFrame([old.A, old.B, old.C]).transpose()
Out[14]: 
   A   B    C
0  4  10  100
1  5  20   50
MarredCheese
  • 13,598
  • 5
  • 77
  • 79
Hit
  • 211
  • 2
  • 4
14

columns by index:

# selected column index: 1, 6, 7
new = old.iloc[: , [1, 6, 7]].copy() 
sailfish009
  • 2,228
  • 1
  • 20
  • 30
7

As far as I can tell, you don't necessarily need to specify the axis when using the filter function.

new = old.filter(['A','B','D'])

returns the same dataframe as

new = old.filter(['A','B','D'], axis=1)
Ellen
  • 111
  • 1
  • 2
6

Generic functional form

def select_columns(data_frame, column_names):
    new_frame = data_frame.loc[:, column_names]
    return new_frame

Specific for your problem above

selected_columns = ['A', 'C', 'D']
new = select_columns(old, selected_columns)
Jeril
  • 6,538
  • 3
  • 47
  • 63
Deslin Naidoo
  • 61
  • 1
  • 1
1

As an alternative:

new = pd.DataFrame().assign(A=old['A'], C=old['C'], D=old['D'])
0

If you want to have a new data frame then:

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new=  old[['A', 'C', 'D']]
Ali.E
  • 77
  • 1
  • 9
0

You can drop columns in the index:

df = pd.DataFrame({'A': [1, 1], 'B': [2, 2], 'C': [3, 3], 'D': [4, 4]})

df[df.columns.drop(['B', 'C'])]

or

df.loc[:, df.columns.drop(['B', 'C'])]

Output:

   A  D
0  1  4
1  1  4
Mykola Zotko
  • 12,250
  • 2
  • 39
  • 53