4

I want to delete columns that start the particular "TYPE" word and do not contain _1?

df =

TYPE_1    TYPE_2    TYPE_3    COL1
aaa       asb       bbb       123

The result should be:

df =

    TYPE_1    COL1
    aaa       123

Currently I am deleting these columns manually, however this approach is not very efficient if the number of columns is big:

df = df.drop(["TYPE_2","TYPE_3"], axis=1)
ayhan
  • 64,199
  • 17
  • 170
  • 189
Dinosaurius
  • 7,494
  • 16
  • 55
  • 103

4 Answers4

8

A list comprehension can be used. Note: axis=1 denotes that we are referring to the column and inplace=True can also be used as per pandas.DataFrame.drop docs.

droplist = [i for i in df.columns if i.startswith('TYPE') and '_1' not in i]
df1.drop(droplist,axis=1,inplace=True)
nipy
  • 4,326
  • 4
  • 22
  • 53
  • For some reason it does not work for columns that are named as, e.g. `TYPE_MOD`. So, the columns containing `_` are ignored. Why? – Dinosaurius Dec 23 '16 at 20:41
  • I just tested this and my column named 'TYPE_MOD' was dropped. Is there whitespace in the 'TYPE_MOD' string maybe? What do you get when entering `df.columns[df.columns == 'TYPE_MOD']` – nipy Dec 23 '16 at 20:54
7

This is the fifth answer but I wanted to showcase the power of the filter dataframe method which filters by column names with regex. This searches for columns that don't start with TYPE or have _1 somewhere in them.

df.filter(regex='^(?!TYPE)|_1')
Ted Petrou
  • 52,577
  • 18
  • 119
  • 125
  • Can you break down this regex? Thanks. – julieth Dec 24 '16 at 03:39
  • 1
    The regex has two parts separated by an or `|`. First half is `^(?!TYPE)` which is a [negative look-ahead](http://stackoverflow.com/questions/899422/regular-expression-for-a-string-that-does-not-start-with-a-sequence) used to find expressions that do not start with TYPE. The second part is simply looking for any matches with `_1` – Ted Petrou Dec 24 '16 at 04:58
2

Easy:

unwanted = [column for column in df.columns 
            if column.startswith("TYPE") and "_1" not in column]
df = df.drop(unwanted)
DYZ
  • 51,549
  • 10
  • 60
  • 87
1
t_cols = [c for c in df.columns.values if c.startswith('TYPE_') and not c == 'TYPE_1']
df.drop(t_cols)

Should do the job

Steve Barnes
  • 26,342
  • 6
  • 60
  • 70