Pandas: extract first 4 digits from column headers

Question

I have a pandas dataframe with +1000 columns. I would like to extract only the first 4 digits from all the headers.

Example of the original datafr:

Q001-XYZ	Q002-XYZ
First	row
Second	row

I would like to get:

Q001	Q002
First	row
Second	row

I tried df.head() = df.head().str[:4] but I get the error SyntaxError: can't assign to function call

Any suggestion?

tdy · Accepted Answer · 2021-07-24T08:36:37.283

3

List comprehension is the fastest option:

df.columns = [col[:4] for col in df.columns]

edited Jul 24 '21 at 08:36

answered Jul 24 '21 at 07:36

tdy

Anurag Dabas · Answer 2 · 2021-07-24T07:17:12.943

2

Try via columns attribute or via rename() method:

df.columns=df.columns.str.split('-').str[0]
#OR
df=df.rename(columns=lambda x:x.split('-')[0])

OR

df.columns=df.columns.str[:4]
#OR
df=df.rename(columns=lambda x:x[:4])

edited Jul 24 '21 at 07:17

answered Jul 24 '21 at 07:11

Anurag Dabas

1

list comprehension is actually fastest: `df.columns = [col[:4] for col in df.columns]` – tdy Jul 24 '21 at 07:18
1

@tdy yes sir you are right due to spliting it takes time list comprehension is also a way but `df.columns.str[:4]` is doing the same thing as `[col[:4] for col in df.columns]` – Anurag Dabas Jul 24 '21 at 07:23
1

i agree they follow the same concept, but the comprehension is faster in practice. i went ahead and [posted it as an answer](https://stackoverflow.com/a/68507963/13138364) with a timing plot. – tdy Jul 24 '21 at 07:42
1

@tdy That's nice..you answer got selected ***;)*** If you have time can you pls check performace for this as well `list(map(lambda x:x[:4],df.columns))` – Anurag Dabas Jul 24 '21 at 07:50
it's very fast but slightly slower than comprehension @AnuragDabas. just updated the plot with this `map` and Pygirl's `.str.extract` – tdy Jul 24 '21 at 08:38

score 1 · Answer 3 · answered Jul 24 '21 at 07:19

1

try using regex:

df.columns = df.columns.str.extract('([\w]+)-', expand=False)

answered Jul 24 '21 at 07:19

Pygirl

score 0 · Answer 4 · answered Jul 24 '21 at 07:23

0

You can try:

df.columns.str[:4].to_list()

answered Jul 24 '21 at 07:23

Samir Hinojosa

4 Answers4