0

I have an input dataframe as below: Input df:

       PET City    Cost  Expense
0      Dog   MH  1500.0      NaN
1      Dog  BLR  1000.0      NaN
2      Dog   DL  2000.0      NaN
3      Cat   MH     NaN    500.0
4      Cat  BLR     NaN    900.0
5      Cat   DL     NaN   2500.0
6     Bird   MH     NaN      NaN
7     Bird  BLR     NaN      NaN
8     Bird   DL     NaN      NaN
9   Others   MH   100.0      NaN
10  Others  BLR   300.0      NaN
11  Others   DL   700.0      NaN

Expected Output:

       PET City    Cost  Expense
0      Dog   MH  1500.0      NaN
1      Dog  BLR  1000.0      NaN
2      Dog   DL  2000.0      NaN
3      Cat   MH     NaN    500.0
4      Cat  BLR     NaN    900.0
5      Cat   DL     NaN   2500.0
6     Bird   MH   500.0      NaN
7     Bird  BLR   900.0      NaN
8     Bird   DL  2500.0      NaN
9   Others   MH   100.0      NaN
10  Others  BLR   300.0      NaN
11  Others   DL   700.0      NaN

Difference in output from input is: For PET having Bird, Cost will be the Expense value of Cat

I have written as below, but getting error.

df_subset = df.loc[(df['PET'] == 'Cat'), ['Expense']]

df['Cost'] = np.select(df["PET"] == 'Bird', [i for i in df_subset['Expense']], df['Cost'])
Biki
  • 73
  • 5

1 Answers1

3

You can try

df.loc[df['PET'].eq('Bird'), 'Cost'] = df.loc[df['PET'].eq('Bird'), 'City'].map(df.set_index('City').loc[lambda x: x['PET'].eq('Cat'), 'Expense'])
print(df)

       PET City    Cost  Expense
0      Dog   MH  1500.0      NaN
1      Dog  BLR  1000.0      NaN
2      Dog   DL  2000.0      NaN
3      Cat   MH     NaN    500.0
4      Cat  BLR     NaN    900.0
5      Cat   DL     NaN   2500.0
6     Bird   MH   500.0      NaN
7     Bird  BLR   900.0      NaN
8     Bird   DL  2500.0      NaN
9   Others   MH   100.0      NaN
10  Others  BLR   300.0      NaN
11  Others   DL   700.0      NaN

If your City column has duplicated items, you can try df.merge

df.loc[df['PET'].eq('Bird'), 'Cost'] = df.loc[df['PET'].eq('Bird'), ['City']].merge(df.loc[df['PET'].eq('Cat'), ['Expense', 'City']], on='City', how='left')['Expense'].tolist()
Ynjxsjmh
  • 16,448
  • 3
  • 17
  • 42
  • I am getting an "InvalidIndexError: Reindexing only valid with uniquely valued Index objects", how to resolve this? Suppose in "City" column values are 0,1,2 and repeating this values in entire column instead of MH, BLR, DL respectively. Also, if I have 2 more columns which are not dependent to any values, how to deal with this? – Biki May 11 '22 at 21:09
  • 1
    @Biki I updated the answer but didn't understand your `Also` part. – Ynjxsjmh May 12 '22 at 03:07
  • @Ynjxsjmh good answer. i upvoted it. Do you think you can share insight on my question https://stackoverflow.com/questions/72209641/how-do-i-read-sql-storedprocedure-data-through-pyodbc-into-a-dataframe – Joe Tha May 12 '22 at 03:13
  • @JoeTha Sorry, I'm not familiar with pandas sql. – Ynjxsjmh May 12 '22 at 04:25