1

What is the best way to create a new column listing the count of the correct fruit for each index.

# create dummy data    
import pandas as pd
    inp = [{'Fruit':'Apple', 'Apple_count':100, 'Orange_count':10, 'Banana_count':50, 'Pear_count':200}, 
           {'Fruit':'Apple', 'Apple_count':100, 'Orange_count':10, 'Banana_count':50, 'Pear_count':200}, 
           {'Fruit':'Apple', 'Apple_count':100, 'Orange_count':10, 'Banana_count':50, 'Pear_count':200},
           {'Fruit':'Orange', 'Apple_count':100, 'Orange_count':10, 'Banana_count':50, 'Pear_count':200},
           {'Fruit':'Orange', 'Apple_count':100, 'Orange_count':10, 'Banana_count':50, 'Pear_count':200},
           {'Fruit':'Banana', 'Apple_count':100, 'Orange_count':10, 'Banana_count':50, 'Pear_count':200},
           {'Fruit':'Pear', 'Apple_count':100, 'Orange_count':10, 'Banana_count':50, 'Pear_count':200}]
    df = pd.DataFrame(inp)

    Fruit  Apple_count  Orange_count  Banana_count  Pear_count
0   Apple          100            10            50         200
1   Apple          100            10            50         200
2   Apple          100            10            50         200
3  Orange          100            10            50         200
4  Orange          100            10            50         200
5  Banana          100            10            50         200
6    Pear          100            10            50         200

I want to create a new column for each row index with the corresponding count of the Fruit category.

Desired Output:

    Fruit  Apple_count  Orange_count  Banana_count  Pear_count  Fruit_count
0   Apple          100            10            50         200          100
1   Apple          100            10            50         200          100
2   Apple          100            10            50         200          100
3  Orange          100            10            50         200           10
4  Orange          100            10            50         200           10
5  Banana          100            10            50         200          200
6    Pear          100            10            50         200           50

I could write a bunch of loc conditionals but in my case there are many more categories than listed in this example.

I also have a dict of the key value pairs of the Fruit and the Fruit_count column names.

fruit_dict = {'Apple':'Apple_count', 'Orange':'Orange_count', 'Banana':'Banana_count', 'Pear':'Pear_count'}
bkeesey
  • 374
  • 1
  • 9
  • [Reference DataFrame value corresponding to column header](https://stackoverflow.com/q/68976278/15497888) works if you [`Series.map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html) first `df['Fruit'].map(fruit_dict)` -> `idx, cols = pd.factorize(df['Fruit'].map(fruit_dict))` or `df.columns.get_indexer(df['Fruit'].map(fruit_dict))`. I'm not sure this is a duplicate because of the `map` step. – Henry Ecker Sep 03 '21 at 19:21
  • 1
    @HenryEcker Thanks! This ended up being the better solution in a more complicated case where the column counts are not all the same. Appreciated as this is really useful to know about! – bkeesey Sep 09 '21 at 18:13

1 Answers1

2

This seems to work for me!

df["Fruit_count"] = df["Fruit"].apply(lambda x: f"{x}_count").map(df.iloc[0])

Or, if you'd like to use fruit_dict

df["Fruit_count"] = df["Fruit"].map(fruit_dict).map(df.iloc[0])
jack boehrer
  • 166
  • 1
  • 5