0

I have a Pandas-df that looks like this:

pods_infos = pd.read_csv("data.txt", delimiter = ";",
        index_col = 0,
        header = None,
        names = ["Position", "Capacity","Capacity reversed", "Storage tag",
                 "Ready for refill", ""]).iloc[:, 0:5]
df = pods_infos.iloc[:, 1:5]
df.head(5)

             Capacity           Capacity reversed   Storage tag       Ready for refill  
0  Capacity:210.0/350  Capacity reserved:0.00/350  Storage tag:  Ready for refill:True  
1  Capacity:210.0/350  Capacity reserved:0.00/350  Storage tag:  Ready for refill:True 
2  Capacity:210.0/350  Capacity reserved:0.00/350  Storage tag:  Ready for refill:True 
3  Capacity:210.0/350  Capacity reserved:0.00/350  Storage tag:  Ready for refill:True 
4  Capacity:210.0/350  Capacity reserved:0.00/350  Storage tag:  Ready for refill:True

Naturally, I would like to remove the repeated header name in each column entry. I have a function that does that well when applied to one column:

def removeColumnHeader(string):
    return(string.str.extract(pat = r"((?<=\:)\S*)"))
removeColumnHeader(df["Capacity"])

            0
0   210.0/350
1   210.0/350
2   210.0/350
3   210.0/350
4   210.0/350
5   210.0/350

Coming from R, I would like to apply this function to the columns of the dataframe. My attempt looks like this:

df.apply(func = lambda x: removeColumnHeader(x))

However, that simply throws me an error:

 If using all scalar values, you must pass an index

I have been googling that and though I am not entirely clear what it means, it seems to have to do with passing elements as scalars instead of arrays. But I have no clue what that means in this specific case of using .apply(). How do I get the function removeColumndHeader() to be applied to multiple columns? I have read this explanation, but I am specificyally loking for the problem in connection to .apply().


In order to provide a MWE, here is the .txt-string that can be imported using the first snippet:

0;7.3500000000000005/0.65;Capacity:210.0/350;Capacity reserved:0.00/350;Storage tag:;Ready for refill:True;
1;8.25/0.65;Capacity:210.0/350;Capacity reserved:0.00/350;Storage tag:;Ready for refill:True;
2;9.15/0.65;Capacity:210.0/350;Capacity reserved:0.00/350;Storage tag:;Ready for refill:True;
3;10.05/0.65;Capacity:210.0/350;Capacity reserved:0.00/350;Storage tag:;Ready for refill:True;
4;10.950000000000001/0.65;Capacity:210.0/350;Capacity reserved:0.00/350;Storage tag:;Ready for refill:True;
Lukas
  • 324
  • 2
  • 4
  • 15

1 Answers1

1

The error message says that if you're passing scalar values, you have to pass an index. So you can either not use scalar values for the columns. e.g. use a list:

>>> df = pd.DataFrame({'A': [x], 'B': [y]})
>>> df
   A  B
0  2  3

or use scalar values and pass an index:

>>> df = pd.DataFrame({'A': x, 'B': y}, index=[0])
>>> df
   A  B
0  2  3
Tasnuva
  • 2,105
  • 1
  • 9
  • 17