1

I have a pandas DataFrame with values in a number of columns, make it two for simplicity, and a column of column names I want to use to pick values from the other columns:

import pandas as pd
import numpy as np

np.random.seed(1337)
df = pd.DataFrame(
    {"a": np.arange(10), "b": 10 - np.arange(10), "c": np.random.choice(["a", "b"], 10)}
)

which gives

> df['c']

0    b
1    b
2    a
3    a
4    b
5    b
6    b
7    a
8    a
9    a
Name: c, dtype: object

That is, I want the first and second elements to be picked from column b, the third from a and so on.

This works:

def pick_vals_from_cols(df, col_selector):
    condlist = np.row_stack(col_selector.map(lambda x: x == df.columns))
    values = np.select(condlist.transpose(), df.values.transpose())
    return values

> pick_vals_from_cols(df, df["c"])

array([10, 9, 2, 3, 6, 5, 4, 7, 8, 9], dtype=object)

But it just feels so fragile and clunky. Is there a better way to do this?

RoyalTS
  • 8,833
  • 11
  • 54
  • 94

1 Answers1

3

lookup

df.lookup(df.index, df.c)

array([10,  9,  2,  3,  6,  5,  4,  7,  8,  9])

Comprehension

But why when you have lookup?

[df.at[t] for t in df.c.items()]

[10, 9, 2, 3, 6, 5, 4, 7, 8, 9]

Bonus Hack

Not intended for actual use

[*map(df.at.__getitem__, zip(df.index, df.c))]

[10, 9, 2, 3, 6, 5, 4, 7, 8, 9]

Because df.get_value is deprecated

[*map(df.get_value, df.index, df.c)]

FutureWarning: get_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead

[10, 9, 2, 3, 6, 5, 4, 7, 8, 9]
piRSquared
  • 265,629
  • 48
  • 427
  • 571