1

I have a pandas dataframe with +1000 columns. I would like to extract only the first 4 digits from all the headers.

Example of the original datafr:

Q001-XYZ Q002-XYZ
First row
Second row

I would like to get:

Q001 Q002
First row
Second row

I tried df.head() = df.head().str[:4] but I get the error SyntaxError: can't assign to function call

Any suggestion?

Stefano
  • 159
  • 5
  • 16

4 Answers4

3

List comprehension is the fastest option:

df.columns = [col[:4] for col in df.columns]

timings

tdy
  • 26,545
  • 9
  • 43
  • 50
2

Try via columns attribute or via rename() method:

df.columns=df.columns.str.split('-').str[0]
#OR
df=df.rename(columns=lambda x:x.split('-')[0])

OR

df.columns=df.columns.str[:4]
#OR
df=df.rename(columns=lambda x:x[:4])
Anurag Dabas
  • 23,002
  • 8
  • 19
  • 34
  • 1
    list comprehension is actually fastest: `df.columns = [col[:4] for col in df.columns]` – tdy Jul 24 '21 at 07:18
  • 1
    @tdy yes sir you are right due to spliting it takes time list comprehension is also a way but `df.columns.str[:4]` is doing the same thing as `[col[:4] for col in df.columns]` – Anurag Dabas Jul 24 '21 at 07:23
  • 1
    i agree they follow the same concept, but the comprehension is faster in practice. i went ahead and [posted it as an answer](https://stackoverflow.com/a/68507963/13138364) with a timing plot. – tdy Jul 24 '21 at 07:42
  • 1
    @tdy That's nice..you answer got selected ***;)*** If you have time can you pls check performace for this as well `list(map(lambda x:x[:4],df.columns))` – Anurag Dabas Jul 24 '21 at 07:50
  • it's very fast but slightly slower than comprehension @AnuragDabas. just updated the plot with this `map` and Pygirl's `.str.extract` – tdy Jul 24 '21 at 08:38
1

try using regex:

df.columns = df.columns.str.extract('([\w]+)-', expand=False)
Pygirl
  • 11,977
  • 4
  • 24
  • 38
0

You can try:

df.columns.str[:4].to_list()
Samir Hinojosa
  • 725
  • 4
  • 19