Create a pandas dataframe of counts

Question

I want to create a pandas dataframe with two columns, the first being the unique values of one of my columns and the second being the count of unique values.

I have seen many posts (such here) as that describe how to get the counts, but the issue I'm running into is when I try to create a dataframe the column values become my index.

Sample data: df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']}). I want to end up with a dataframe like:

   Color Count
0   Red  2
1  Blue  1

I have tried the following, but in all cases the index ends up as Color and the Count is the only column in the dataframe.

Attempt 1:

df2 = pd.DataFrame(data=df['Color'].value_counts())
# And resetting the index just gets rid of Color, which I want to keep
df2 = df2.reset_index(drop=True)

Attempt 2:

df3 = df['Color'].value_counts()
df3 = pd.DataFrame(data=df3, index=range(df3.shape[0]))

Attempt 3:

df4 = df.groupby('Color')
df4 = pd.DataFrame(df4['Color'].count())

score 14 · Accepted Answer · edited Jun 27 '20 at 13:38

14

Another way to do this, using value_counts:

In [10]: df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']})

In [11]: df.Color.value_counts().reset_index().rename(
           columns={'index': 'Color', 0: 'count'})
Out[11]:
  Color  count
0   Red      2
1  Blue      1

edited Jun 27 '20 at 13:38

user3313834

6,579
7
50
91

answered Jun 26 '15 at 15:37

Phillip Cloud

23,488
11
67
88

score 2 · Answer 2 · answered Jun 26 '15 at 15:33

2

Essentially equivalent to setting the column names, but using the rename method instead:

df.groupby('Color').count().reset_index().rename(columns={'State': 'Count'})

answered Jun 26 '15 at 15:33

mdurant

24,595
5
38
66

I noticed a small issue with this approach: if there are more than two columns, then all additional columns will get the counts as well. – Tchotchke Jun 29 '15 at 22:43
It depends what you want to see. Say there are two rows with colour=red, then for each column there are two values for which the row's colour=red. You could always explicitly select one column only. – mdurant Jun 30 '15 at 14:31

score 2 · Answer 3 · answered Dec 17 '18 at 01:58

2

One readable solution is to use to_frame and rename_axis methods:

res = df['Color'].value_counts()\
                 .to_frame('count').rename_axis('Color')\
                 .reset_index()

print(res)

  Color  count
0   Red      2
1  Blue      1

answered Dec 17 '18 at 01:58

jpp

147,904
31
244
302

score 0 · Answer 4 · edited Jun 27 '20 at 14:32

0

df = df.groupby('Color').count().reset_index()
df.columns = ['Color','Count']

edited Jun 27 '20 at 14:32

jpp

147,904
31
244
302

answered Jun 26 '15 at 15:29

khammel

1,987
1
12
18

Create a pandas dataframe of counts

4 Answers4

Linked