How to delete columns based on condition

Question

I want to delete columns that start the particular "TYPE" word and do not contain _1?

df =

TYPE_1    TYPE_2    TYPE_3    COL1
aaa       asb       bbb       123

The result should be:

df =

    TYPE_1    COL1
    aaa       123

Currently I am deleting these columns manually, however this approach is not very efficient if the number of columns is big:

df = df.drop(["TYPE_2","TYPE_3"], axis=1)

Possible duplicate of [Deleting multiple columns in Pandas](https://stackoverflow.com/q/28538536/1278112) — Shihe Zhang, Nov 06 '17 at 05:23

score 8 · Accepted Answer · answered Dec 23 '16 at 20:24

8

A list comprehension can be used. Note: axis=1 denotes that we are referring to the column and inplace=True can also be used as per pandas.DataFrame.drop docs.

droplist = [i for i in df.columns if i.startswith('TYPE') and '_1' not in i]
df1.drop(droplist,axis=1,inplace=True)

answered Dec 23 '16 at 20:24

nipy

4,326
4
22
53

For some reason it does not work for columns that are named as, e.g. `TYPE_MOD`. So, the columns containing `_` are ignored. Why? – Dinosaurius Dec 23 '16 at 20:41
I just tested this and my column named 'TYPE_MOD' was dropped. Is there whitespace in the 'TYPE_MOD' string maybe? What do you get when entering `df.columns[df.columns == 'TYPE_MOD']` – nipy Dec 23 '16 at 20:54

score 7 · Answer 2 · answered Dec 23 '16 at 23:05

7

This is the fifth answer but I wanted to showcase the power of the filter dataframe method which filters by column names with regex. This searches for columns that don't start with TYPE or have _1 somewhere in them.

df.filter(regex='^(?!TYPE)|_1')

answered Dec 23 '16 at 23:05

Ted Petrou

52,577
18
119
125

Can you break down this regex? Thanks. – julieth Dec 24 '16 at 03:39
1

The regex has two parts separated by an or `|`. First half is `^(?!TYPE)` which is a [negative look-ahead](http://stackoverflow.com/questions/899422/regular-expression-for-a-string-that-does-not-start-with-a-sequence) used to find expressions that do not start with TYPE. The second part is simply looking for any matches with `_1` – Ted Petrou Dec 24 '16 at 04:58

score 2 · Answer 3 · answered Dec 23 '16 at 20:06

2

Easy:

unwanted = [column for column in df.columns 
            if column.startswith("TYPE") and "_1" not in column]
df = df.drop(unwanted)

answered Dec 23 '16 at 20:06

DYZ

51,549
10
60
87

score 1 · Answer 4 · answered Dec 23 '16 at 20:07

1

t_cols = [c for c in df.columns.values if c.startswith('TYPE_') and not c == 'TYPE_1']
df.drop(t_cols)

Should do the job

answered Dec 23 '16 at 20:07

Steve Barnes

26,342
6
60
70

OP says "do not contain _1", not "ends with _1". – DYZ Dec 23 '16 at 20:08

How to delete columns based on condition

4 Answers4