9

I have the following dataframe df:

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing
ABC            5      6     10     2015
BCD            6      7     3      2016        
DEF            10     4     5      2017
GHI            8      7     10     2016

I would like to look up the value of the customer in the year they joined the mailing list and save it in a new column.

Output would be:

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing | Purchases_1st_year
ABC            5      6     10     2015                       5
BCD            6      7     3      2016                       7       
DEF            10     4     5      2017                       5
GHI            8      9     10     2016                       9

I have found some solutions for match vlookup in python, but none that would use the headers of other columns.

jeangelj
  • 3,878
  • 12
  • 45
  • 94

3 Answers3

12

Deprecation Notice: lookup was deprecated in v1.2.0

Use pd.DataFrame.lookup
Keep in mind that I'm assuming Customer_ID is the index.

df.lookup(df.index, df.Year_joined_mailing)

array([5, 7, 5, 7])

df.assign(
    Purchases_1st_year=df.lookup(df.index, df.Year_joined_mailing)
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7

However, you have to be careful with comparing possible strings in the column names and integers in the first year column...

Nuclear option to ensure type comparisons are respected.

df.assign(
    Purchases_1st_year=df.rename(columns=str).lookup(
        df.index, df.Year_joined_mailing.astype(str)
    )
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7
Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
piRSquared
  • 265,629
  • 48
  • 427
  • 571
  • magic ... didn't think it was possible in a one-liner - THANK YOU – jeangelj Jul 19 '17 at 17:59
  • Thank you - it worked for one column, but for my second one in the same dataframe, I get this error KeyError: 'One or more column labels was not found' - could it be related to index? I set Customer_ID to index df = df.set_index(['Customer_ID']) – jeangelj Jul 20 '17 at 13:58
3

you can apply "apply" to each row

df.apply(lambda x: x[x['Year_joined_mailing']],axis=1)
galaxyan
  • 5,527
  • 2
  • 18
  • 41
2

I would do it like this, assuming that the column headers and the Year_joined_mailing are the same data type and that all Year_joined_mailing values are valid columns. If the datatypes are not the same, you could convert it by adding str() or int() where appropriate.

df['Purchases_1st_year'] = [df[df['Year_joined_mailing'][i]][i] for i in df.index]

What we're doing here is iterating over the indexes in the dataframe to get the 'Year_joined_mailing' field for that index, then using that to get the column we want, and again selecting that index from the column, pushing it all to a list and assigning this to our new column 'Year_joined_mailing'

If your 'Year_joined_mailing' column will not always be a valid column name, then try:

from numpy import nan
new_col = []
for i in df.index:
    try:
        new_col.append(df[df['Year_joined_mailing'][i]][i])
    except IndexError:
        new_col.append(nan) #or whatever null value you want here)
df['Purchases_1st_year'] = new_col

This longer code snippet accomplishes the same thing, but will not break if 'Year_joined_mailing' is not in df.columns

Jeremy Barnes
  • 572
  • 1
  • 7
  • 21