How can I select a specific column from each row in a Pandas DataFrame?

Question

I have a DataFrame in this format:

    a   b   c
0   1   2   3
1   4   5   6
2   7   8   9
3   10  11  12
4   13  14  15

and an array like this, with column names:

['a', 'a', 'b', 'c', 'b']

and I’m hoping to extract an array of data, one value from each row. The array of column names specifies which column I want from each row. Here, the result would be:

[1, 4, 8, 12, 14]

Is this possible as a single command with Pandas, or do I need to iterate? I tried using indexing

i = pd.Index(['a', 'a', 'b', 'c', 'b'])
i.choose(df)

but I got a segfault, which I couldn’t diagnose because the documentation is lacking.

score 29 · Accepted Answer · answered Jul 18 '14 at 20:50

29

You could use lookup, e.g.

>>> i = pd.Series(['a', 'a', 'b', 'c', 'b'])
>>> df.lookup(i.index, i.values)
array([ 1,  4,  8, 12, 14])

where i.index could be different from range(len(i)) if you wanted.

answered Jul 18 '14 at 20:50

DSM

319,184
61
566
472

That’s fantastic, thank you! Is it also possible to _assign_ to those indexes? – gggritso Jul 18 '14 at 21:44
1

You *can* assign, but ONLY when the frame is a single dtype (as it is now). ``df.unstack().loc[zip(i.values,i.index)] = [1,2,3,4,5]``. And you must match the length on both sides (you can also select using this syntax); see this issue: https://github.com/pydata/pandas/issues/7138 – Jeff Jul 18 '14 at 21:57
If you want to add the index, make a series: ``pd.Series(df.lookup(i.index, i.values), index=i.index)`` – user394430 Nov 29 '16 at 18:30
2

In pandas 1.2.0, the lookup function is deprecated and it is recommended to use either .loc or .melt (see : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.lookup.html) – MorningGlory Jan 10 '21 at 19:59

score 4 · Answer 2 · answered Jul 18 '14 at 20:45

4

For large datasets, you can use indexing on the base numpy data, if you're prepared to transform your column names into a numerical index (simple in this case):

df.values[arange(5),[0,0,1,2,1]]

out: array([ 1,  4,  8, 12, 14])

This will be much more efficient that list comprehensions, or other explicit iterations.

answered Jul 18 '14 at 20:45

mdurant

24,595
5
38
66

This should be the new accepted answer. Since [`pd.lookup()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.lookup.html) is deprecated now and the `melt()` solution can get you into memory issues for large data sets – Arturo Rodriguez Jun 29 '21 at 23:27

score 0 · Answer 3 · answered Jul 18 '14 at 20:24

0

You can always use list comprehension:

[df.loc[idx, col] for idx, col in enumerate(['a', 'a', 'b', 'c', 'b'])]

answered Jul 18 '14 at 20:24

Gregor

1,287
8
16

This is not vectorized, you can do anything with for loops – Wildhammer Oct 15 '21 at 12:01

How can I select a specific column from each row in a Pandas DataFrame?

3 Answers3

Linked

Related