Accessing every 1st element of Pandas DataFrame column containing lists

Question

I have a Pandas DataFrame with a column containing lists objects

      A
0   [1,2]
1   [3,4]
2   [8,9] 
3   [2,6]

How can I access the first element of each list and save it into a new column of the DataFrame? To get a result like this:

      A     new_col
0   [1,2]      1
1   [3,4]      3
2   [8,9]      8
3   [2,6]      2

I know this could be done via iterating over each row, but is there any "pythonic" way?

score 65 · Answer 1 · answered May 09 '16 at 20:54

65

As always, remember that storing non-scalar objects in frames is generally disfavoured, and should really only be used as a temporary intermediate step.

That said, you can use the .str accessor even though it's not a column of strings:

>>> df = pd.DataFrame({"A": [[1,2],[3,4],[8,9],[2,6]]})
>>> df["new_col"] = df["A"].str[0]
>>> df
        A  new_col
0  [1, 2]        1
1  [3, 4]        3
2  [8, 9]        8
3  [2, 6]        2
>>> df["new_col"]
0    1
1    3
2    8
3    2
Name: new_col, dtype: int64

answered May 09 '16 at 20:54

DSM

319,184
61
566
472

1

This was really just temporary, because I used '.split()' on the strings in these columns. Thank you for your quick help! – mkoala May 09 '16 at 20:59
1

I tough this is the most elegant solution, but on series with length 5 millions where each element just a list with 2 element, the .str[idx] method took 3.31 second while the .apply(lambda x: x[idx]) took 1.43 seconds – Vinson Ciawandy Apr 06 '21 at 08:59
1

This might be slower than the going via .apply(), but it deals with NaN values elegantly (i.e. it leaves NaN as NaN without throwing an error). – Thor Jun 23 '21 at 16:39
How come `.str` works? – foebu May 31 '22 at 20:49

score 40 · Accepted Answer · answered May 09 '16 at 20:53

40

You can use map and a lambda function

df.loc[:, 'new_col'] = df.A.map(lambda x: x[0])

answered May 09 '16 at 20:53

dmb

1,439
1
10
20

in myy case the code had the shortest runtimewith your solution. Thanks for the help! – mkoala May 09 '16 at 21:04
But also see DSM's reply further down if you need this to work for NaN values. – Thor Jun 23 '21 at 16:41

score 13 · Answer 3 · answered May 09 '16 at 20:51

13

Use apply with x[0]:

df['new_col'] = df.A.apply(lambda x: x[0])
print df
        A  new_col
0  [1, 2]        1
1  [3, 4]        3
2  [8, 9]        8
3  [2, 6]        2

answered May 09 '16 at 20:51

jezrael

729,927
78
1,141
1,090

Alexander · Answer 4 · 2016-05-09T21:42:39.220

You can just use a conditional list comprehension which takes the first value of any iterable or else uses None for that item. List comprehensions are very Pythonic.

df['new_col'] = [val[0] if hasattr(val, '__iter__') else None for val in df["A"]]

>>> df
        A  new_col
0  [1, 2]        1
1  [3, 4]        3
2  [8, 9]        8
3  [2, 6]        2

Timings

df = pd.concat([df] * 10000)

%timeit df['new_col'] = [val[0] if hasattr(val, '__iter__') else None for val in df["A"]]
100 loops, best of 3: 13.2 ms per loop

%timeit df["new_col"] = df["A"].str[0]
100 loops, best of 3: 15.3 ms per loop

%timeit df['new_col'] = df.A.apply(lambda x: x[0])
100 loops, best of 3: 12.1 ms per loop

%timeit df.A.map(lambda x: x[0])
100 loops, best of 3: 11.1 ms per loop

Removing the safety check ensuring an interable.

%timeit df['new_col'] = [val[0] for val in df["A"]]
100 loops, best of 3: 7.38 ms per loop

Just be aware that `hasattr(..., '__iter__')` isn't a magic list identifier, it'll also work for strings, e.g. `hasattr('hello', '__iter__')` returns `True`, which may not be what you want. — jpp, Jan 31 '19 at 09:42

score 2 · Answer 5 · answered Sep 21 '21 at 07:21

2

You can use the method str.get:

df['A'].str.get(0)

answered Sep 21 '21 at 07:21

Mykola Zotko

12,250
2
39
53

Accessing every 1st element of Pandas DataFrame column containing lists

5 Answers5

Linked

Related