Pandas dataframe total row

Question

I have a dataframe, something like:

     foo  bar  qux
0    a    1    3.14
1    b    3    2.72
2    c    2    1.62
3    d    9    1.41
4    e    3    0.58

and I would like to add a 'total' row to the end of dataframe:

     foo  bar  qux
0    a    1    3.14
1    b    3    2.72
2    c    2    1.62
3    d    9    1.41
4    e    3    0.58
5  total  18   9.47

I've tried to use the sum command but I end up with a Series, which although I can convert back to a Dataframe, doesn't maintain the data types:

tot_row = pd.DataFrame(df.sum()).T
tot_row['foo'] = 'tot'
tot_row.dtypes:
     foo    object
     bar    object
     qux    object

I would like to maintain the data types from the original data frame as I need to apply other operations to the total row, something like:

baz = 2*tot_row['qux'] + 3*tot_row['bar']

Try `df.loc['Total']= df.sum()`, referenced from this [link](http://stackoverflow.com/questions/20804673/appending-column-totals-to-a-pandas-dataframe) — Kevin Zhu, Oct 20 '16 at 09:13

jmz · Accepted Answer · 2014-02-14T09:50:43.680

72

Append a totals row with

df.append(df.sum(numeric_only=True), ignore_index=True)

The conversion is necessary only if you have a column of strings or objects.

It's a bit of a fragile solution so I'd recommend sticking to operations on the dataframe, though. eg.

baz = 2*df['qux'].sum() + 3*df['bar'].sum()

edited Feb 14 '14 at 09:50

answered Feb 13 '14 at 13:32

jmz

3,758
27
26

This doesn't maintain the original data types; all columns are converted to objects (strings by the looks of it). – Daniel Feb 13 '14 at 14:27
Sorry, you're right, I didn't check it with a column of strings. Updated now. – jmz Feb 13 '14 at 14:35
This is almost correct. The only thing is that the data types aren't quite preserved; ints are converted to floats. Luckily I don't think this will be an issue for me. I'll accept this solution if nobody posts an alternative today. – Daniel Feb 14 '14 at 08:22
1

Updated again to prevent any type changes of summed numeric data. Strings will now shows up as `np.nan` in total row. – jmz Feb 14 '14 at 09:52
This is great. Worked for me, thanks. But how can I change the index back to my designated list after this? – Bowen Liu Jan 31 '19 at 15:08
Use `df.append(df.sum(numeric_only=True).rename('Total'))` if you want to preserve the index. – m01010011 Jul 31 '20 at 03:36
1

But won't this change the actual data of the DataFrame? So e.g. `df.describe()` would take the data plus the sum of the data? – Martin Hepp Mar 14 '22 at 10:12
append will be deprecated, use concat instead – DISC-O May 25 '22 at 15:24
June 2022: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. – PatrickT Jun 03 '22 at 01:51

Matthias Kauer · Answer 2 · 2018-11-17T01:10:08.400

64

df.loc["Total"] = df.sum()

works for me and I find it easier to remember. Am I missing something? Probably wasn't possible in earlier versions.

I'd actually like to add the total row only temporarily though. Adding it permanently is good for display but makes it a hassle in further calculations.

Just found

df.append(df.sum().rename('Total'))

This prints what I want in a Jupyter notebook and appears to leave the df itself untouched.

edited Nov 17 '18 at 01:10

answered Nov 17 '18 at 01:02

Matthias Kauer

8,677
5
16
18

1

nice! very similar to R, fast and clean! :-) – Ivan Carrasco Quiroz Dec 12 '18 at 04:15
This worked for me, very simple. Upvoted :) – Ganesh Jul 11 '21 at 15:03
June 2022: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. – PatrickT Jun 03 '22 at 01:52

BhishanPoudel · Answer 3 · 2021-03-01T02:44:25.900

45

New Method

To get both row and column total:

import numpy as np
import pandas as pd


df = pd.DataFrame({'a': [10,20],'b':[100,200],'c': ['a','b']})

df.loc['Column_Total']= df.sum(numeric_only=True, axis=0)
df.loc[:,'Row_Total'] = df.sum(numeric_only=True, axis=1)

print(df)


                 a      b    c  Row_Total
0             10.0  100.0    a      110.0
1             20.0  200.0    b      220.0
Column_Total  30.0  300.0  NaN      330.0

edited Mar 01 '21 at 02:44

answered Jun 23 '19 at 02:09

BhishanPoudel

1
15
87
137

this should be build in function ^^] – n3rd Jan 06 '22 at 13:20

score 24 · Answer 4 · edited Nov 27 '18 at 17:02

Use DataFrame.pivot_table with margins=True:

import pandas as pd
data = [('a',1,3.14),('b',3,2.72),('c',2,1.62),('d',9,1.41),('e',3,.58)]
df = pd.DataFrame(data, columns=('foo', 'bar', 'qux'))

Original df:

  foo  bar   qux
0   a    1  3.14
1   b    3  2.72
2   c    2  1.62
3   d    9  1.41
4   e    3  0.58

Since pivot_table requires some sort of grouping (without the index argument, it'll raise a ValueError: No group keys passed!), and your original index is vacuous, we'll use the foo column:

df.pivot_table(index='foo',
               margins=True,
               margins_name='total',  # defaults to 'All'
               aggfunc=sum)

Voilà!

       bar   qux
foo             
a        1  3.14
b        3  2.72
c        2  1.62
d        9  1.41
e        3  0.58
total   18  9.47

This worked for me, but when I write df again in command prompt, it prints the old df and I do not see the sum. also it does not write the total in Excel file — FabioSpaghetti, Aug 11 '19 at 11:34

score 9 · Answer 5 · answered Jul 11 '17 at 05:56

Alternative way (verified on Pandas 0.18.1):

import numpy as np
total = df.apply(np.sum)
total['foo'] = 'tot'
df.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)

Result:

   foo   bar   qux
0    a     1  3.14
1    b     3  2.72
2    c     2  1.62
3    d     9  1.41
4    e     3  0.58
5  tot    18  9.47

score 6 · Answer 6 · answered Mar 11 '19 at 04:05

Building on JMZ answer

df.append(df.sum(numeric_only=True), ignore_index=True)

if you want to continue using your current index you can name the sum series using .rename() as follows:

df.append(df.sum().rename('Total'))

This will add a row at the bottom of the table.

score 5 · Answer 7 · answered May 29 '19 at 12:09

5

Building on answer from Matthias Kauer.

To add row total:

df.loc["Row_Total"] = df.sum()

To add column total,

df.loc[:,"Column_Total"] = df.sum(axis=1)

answered May 29 '19 at 12:09

Sarah

1,518
13
15

score 4 · Answer 8 · answered Feb 16 '19 at 17:07

4

This is the way that I do it, by transposing and using the assign method in combination with a lambda function. It makes it simple for me.

df.T.assign(GrandTotal = lambda x: x.sum(axis=1)).T

answered Feb 16 '19 at 17:07

Samlex

119
1
4

score 3 · Answer 9 · answered Sep 06 '17 at 15:07

Following helped for me to add a column total and row total to a dataframe.

Assume dft1 is your original dataframe... now add a column total and row total with the following steps.

from io import StringIO
import pandas as pd

#create dataframe string
dfstr = StringIO(u"""
a;b;c
1;1;1
2;2;2
3;3;3
4;4;4
5;5;5
""")

#create dataframe dft1 from string
dft1 = pd.read_csv(dfstr, sep=";")

## add a column total to dft1
dft1['Total'] = dft1.sum(axis=1)

## add a row total to dft1 with the following steps

sum_row = dft1.sum(axis=0) #get sum_row first
dft1_sum=pd.DataFrame(data=sum_row).T #change it to a dataframe

dft1_sum=dft1_sum.reindex(columns=dft1.columns) #line up the col index to dft1
dft1_sum.index = ['row_total'] #change row index to row_total

dft1.append(dft1_sum) # append the row to dft1

score 0 · Answer 10 · answered Mar 14 '22 at 10:33

Actually all proposed solutions render the original DataFrame unusable for any further analysis and can invalidate following computations, which will be easy to overlook and could lead to false results.

This is because you add a row to the data, which Pandas cannot differentiate from an additional row of data.

Example:

import pandas as pd
data = [1, 5, 6, 8, 9]
df = pd.DataFrame(data)
df
df.describe()

yields

	0
0	1
1	5
2	6
3	8
4	9

	0
count	5
mean	5.8
std	3.11448
min	1
25%	5
50%	6
75%	8
max	9

After

df.loc['Totals']= df.sum(numeric_only=True, axis=0)

the dataframe looks like this

	0
0	1
1	5
2	6
3	8
4	9
Totals	29

This looks nice, but the new row is treated as if it was an additional data item, so df.describe will produce false results:

	0
count	6
mean	9.66667
std	9.87252
min	1
25%	5.25
50%	7
75%	8.75
max	29

So: Watch out! and apply this only after doing all other analyses of the data or work on a copy of the DataFrame!

Pandas dataframe total row

10 Answers10

New Method

Linked

Related