Subtract two columns in dataframe

Question

My df looks as follows:

Index    Country    Val1  Val2 ... Val10
1        Australia  1     3    ... 5
2        Bambua     12    33   ... 56
3        Tambua     14    34   ... 58

I'd like to substract Val10 from Val1 for each country, so output looks like:

Country    Val10-Val1
Australia  4
Bambua     23
Tambua     24

So far I've got:

def myDelta(row):
    data = row[['Val10', 'Val1']]
    return pd.Series({'Delta': np.subtract(data)})

def runDeltas():
    myDF = getDF() \
        .apply(myDelta, axis=1) \
        .sort_values(by=['Delta'], ascending=False)
    return myDF

runDeltas results in this error:

ValueError: ('invalid number of arguments', u'occurred at index 9')

What's the proper way to fix this?

Alberto Chiusole · Accepted Answer · 2021-03-29T03:48:04.680

28

Given the following dataframe:

df = pd.DataFrame([["Australia", 1, 3, 5],
                   ["Bambua", 12, 33, 56],
                   ["Tambua", 14, 34, 58]
                  ], columns=["Country", "Val1", "Val2", "Val10"]
                 )

It comes down to a simple broadcasting operation:

>>> df["Val1"] - df["Val10"]
0    -4
1   -44
2   -44
dtype: int64

edited Mar 29 '21 at 03:48

answered Jan 19 '18 at 23:37

Alberto Chiusole

1,654
1
14
25

score 14 · Answer 2 · answered May 04 '18 at 18:43

Using this as the df:

df = pd.DataFrame([["Australia", 1, 3, 5],
               ["Bambua", 12, 33, 56],
               ["Tambua", 14, 34, 58]
              ], columns=["Country", "Val1", "Val2", "Val10"]
             )

You can also do the subtraction and put it into a new column as follows.

>>>df['Val_Diff'] = df['Val10'] - df['Val1']

    Country     Val1    Val2  Val10 Val_Diff
0   Australia   1       3      5    4
1   Bambua      12      33     56   44
2   Tambua      14      34     58   44

score 10 · Answer 3 · answered Dec 13 '18 at 00:26

10

You can do this by using lambda function and assign to new column.

df['Val10-Val1'] = df.apply(lambda x: x['Val10'] - x['Val1'], axis=1)
print df

answered Dec 13 '18 at 00:26

Rishi Bansal

3,271
2
22
42

Note: vectorization over a Pandas Series (such as df[col2]-df[col1]) will generally have better performance than using DataFrame apply with axis=1. – mstrthealias Mar 28 '21 at 19:15

score 6 · Answer 4 · answered Nov 29 '18 at 10:39

You can also use pandas.DataFrame.assign function: e,g

import numpy as np
import pandas as pd

df = pd.DataFrame([["Australia", 1, 3, 5],
                   ["Bambua", 12, 33, 56],
                   ["Tambua", 14, 34, 58]
                  ], columns=["Country", "Val1", "Val2", "Val10"]
                 )

df = df.assign(Val10_minus_Val1 = df['Val10'] - df['Val1'])

The best part of assign is you can add as many assignments as you wish. e.g. getting both the difference and then the log of it

df = df.assign(Val10_minus_Val1 = df['Val10'] - df['Val1'], log_result = lambda x: np.log(x.Val10_minus_Val1) )

Results:

score 1 · Answer 5 · answered Oct 06 '21 at 07:26

Though it's an old question but pandas allows subtracting two DataFrames or Seriess using pandas.DataFrame.subtract

import pandas as pd

df = pd.DataFrame([["Australia", 1, 3, 5],
                   ["Bambua", 12, 33, 56],
                   ["Tambua", 14, 34, 58]
                  ], columns=["Country", "Val1", "Val2", "Val10"]
                 )


df["Val1"].subtract(df["Val2"])

Output:

0    -2
1   -21
2   -20
dtype: int64

score 0 · Answer 6 · answered Nov 29 '18 at 09:25

0

What I have faced today, makes me ambitious to share it with you. As people mentioned above you can used easily:

df['Val10-Val1'] = df['Val10']-df['Val1']

but sometimes you might need to use apply function, so you might use the following line:

df['Val10-Val1'] = df.apply(lambda row: row['Val10']-row['Val1'])

answered Nov 29 '18 at 09:25

Navid

91
1
4

Be careful! Your code `df['Val10-Val1'] = df.apply(lambda row: row['Val10']-row['Val1'])` will produce `KeyError: ('Val10', 'occurred at index Country')`, because you have not specified an axis correctly. The functional code is following: `df['Val10-Val1'] = df.apply(lambda row: row['Val10'] - row['Val1'], axis=1)`. – Jaroslav Bezděk Mar 16 '20 at 07:45

Subtract two columns in dataframe

6 Answers6

Linked

Related