0

I have two databases with multiple column dataset-1(df1) has more than a couple of thousand rows and dataset-2(df2) is smaller... 300 rows.

I need to pickup a 'value' from column 3 in df2 based on matching 'fruit' in df1 with 'type' in df2 and 'expiry' in df1 with 'expiry' in df2

Furthermore, Instead of storing the 'Value' directly in a new column in df1, i need to perform a multiplication on the value in each row and the output gets to be stored in a new a column in df1.

So for example if expiry is 2 the value gets multiplied by 2 and if its 3 value gets multiplied by 3.. and so on and so forth!

I was able to solve this by using the code below, but.....:

for i in range(0, len(df1)):
df1_value = df2.loc[(df2['type'] == df1.iloc[i]['fruit']) & (df2['expiry'] == str(df1.iloc[i]['expiry'])].iloc[0]['value']
df1.loc[i, 'df_value'] = df1.iloc[i]['expiry']*df1_value

It creates two issues,

  1. If the iteration throws up a null value (for example there is no 'value' for banana with expiry of 3 in df2), the process stops and it gives me an error -IndexError: single positional indexer is out-of-bounds
  2. Because df1 has a very large number of rows, the individual iterations take a lot of time.

Is there a better way to handle this?

say df1:

fruit     expiry category
apple      3         a
apple      3         b
apple      4         c
apple      4         d
orange     2         a
orange     2         b
orange     3         c
orange     3         d
orange     3         e
banana     3         a
banana     3         b
banana     3         c
banana     4         d
pineapple  2         a
pineapple  3         b
pineapple  3         c
pineapple  4         d
pineapple  4         e

df2:

type        expiry  value
apple        2       100
apple        3       110
apple        4       120
orange       2       200
orange       3       210
orange       4       220
banana       2       310
banana       4       320
pineapple    2       410
pineapple    3       420
pineapple    4       430

output: -revised df1

fruit       expiry  category    df_value
apple       3        a           110*3=330
apple       3        b           110*3=330
apple       4        c           120*4=480
apple       4        d           120*4=480
orange      2        a           200...
orange      2        b           200...
orange      3        c           210...
orange      3        d           210...
orange      3        e           210...
banana      3        a           0  
banana      3        b           0
banana      3        c           0
banana      4        d           320*4=1280
pineapple   2        a           410*2=820
pineapple   3        b           420...
pineapple   3        c           420...
pineapple   4        d           430....
pineapple   4        e           430....
Akshay
  • 1
  • 2

0 Answers0