Pandas get the most frequent values of a column

Question

i have this dataframe:

0 name data
1 alex asd
2 helen sdd
3 alex dss
4 helen sdsd
5 john sdadd

so i am trying to get the most frequent value or values(in this case its values) so what i do is:

dataframe['name'].value_counts().idxmax()

but it returns only the value: Alex even if it Helen appears two times as well.

score 114 · Accepted Answer · answered Feb 02 '18 at 20:23

114

By using mode

df.name.mode()
Out[712]: 
0     alex
1    helen
dtype: object

answered Feb 02 '18 at 20:23

BENY

296,997
19
147
204

Hmmm, I have seen you using mode earlier :) – Vaishali Feb 02 '18 at 21:05
2

@Vaishali yep, that is from scipy.mode , which will return the mode and the count , for pd.mode, it one return the value :-) – BENY Feb 02 '18 at 21:12

score 67 · Answer 2 · answered Apr 28 '19 at 06:47

67

To get the n most frequent values, just subset .value_counts() and grab the index:

# get top 10 most frequent names
n = 10
dataframe['name'].value_counts()[:n].index.tolist()

answered Apr 28 '19 at 06:47

Jared Wilber

4,775
29
34

1

What exactly does adding .index does? Why can't I leave it till [:n]? – user1953366 Apr 28 '19 at 07:10
1

The returned data structure will have the `name` values stored in the index, with their respective counts stored as the value. So if you didn't use index, you'd get a list of the most frequent counts, not the associated `name`. – Jared Wilber Apr 28 '19 at 18:15
1

Great this works. But need to find the top n for all the columns in one go. and store them in n columns. So dataframe will have {colmn_name, mode_1, mode_2...mode_n} – Vikrant Dec 02 '19 at 09:42

score 18 · Answer 3 · answered Jun 27 '18 at 02:57

18

You could try argmax like this:

dataframe['name'].value_counts().argmax() Out[13]: 'alex'

The value_counts will return a count object of pandas.core.series.Series and argmax could be used to achieve the key of max values.

answered Jun 27 '18 at 02:57

Lunar_one

329
2
4

2

`argmax` is deprecated for `idmax` – Bhoomtawath Plinsut Nov 10 '18 at 13:46
4

Just a small typo correction: is not ```idmax```, but ```idxmax``` – ralvarez Jul 05 '19 at 08:08

Taie · Answer 4 · 2019-09-11T09:03:58.067

9

df['name'].value_counts()[:5].sort_values(ascending=False)

The value_counts will return a count object of pandas.core.series.Series and sort_values(ascending=False) will get you the highest values first.

edited Sep 11 '19 at 09:03

answered Sep 11 '19 at 08:32

Taie

609
9
22

1

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – xiawi Sep 11 '19 at 08:57
`value_counts()` already returns a sort in descending order, so calling `sort_values()` is unnecessary. See [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.value_counts.html). – Matt VanEseltine Oct 20 '20 at 21:02

score 7 · Answer 5 · answered Aug 15 '18 at 05:18

7

You can use this to get a perfect count, it calculates the mode a particular column

df['name'].value_counts()

answered Aug 15 '18 at 05:18

paul okoduwa

71
1
1

score 6 · Answer 6 · answered Feb 02 '18 at 20:22

6

Here's one way:

df['name'].value_counts()[df['name'].value_counts() == df['name'].value_counts().max()]

which prints:

helen    2
alex     2
Name: name, dtype: int64

answered Feb 02 '18 at 20:22

pault

37,170
13
92
132

score 6 · Answer 7 · answered Jul 06 '20 at 09:15

6

Use:

df['name'].mode()

or

df['name'].value_counts().idxmax()

answered Jul 06 '20 at 09:15

Mohit Mehlawat

101
1
2

score 4 · Answer 8 · answered Feb 02 '18 at 20:34

4

Not Obvious, But Fast

f, u = pd.factorize(df.name.values)
counts = np.bincount(f)
u[counts == counts.max()]

array(['alex', 'helen'], dtype=object)

answered Feb 02 '18 at 20:34

piRSquared

265,629
48
427
571

For numeric data, this was slightly slower for me :) Like 5% – The Unfun Cat Nov 14 '19 at 10:39

score 4 · Answer 9 · answered Jul 02 '19 at 09:03

4

to get top 5:

dataframe['name'].value_counts()[0:5]

answered Jul 02 '19 at 09:03

Naomi Fridman

1,861
1
24
34

2

I actually like this answer, but there is one issue. Doing this just returns the frequency, not the label. Fix this by using ```dataframe['name'].value_counts().keys()[0:5]``` instead. – Jul 25 '19 at 17:32

score 4 · Answer 10 · edited Jan 21 '22 at 08:07

4

It will give top five most common names:

df['name'].value_counts().nlargest(5)

edited Jan 21 '22 at 08:07

Syscall

18,131
10
32
49

answered Jan 21 '22 at 07:25

Sandhya Krishnan

91
2
6

score 3 · Answer 11 · edited May 03 '20 at 07:01

3

Simply use this..

dataframe['name'].value_counts().nlargest(n)

The functions for frequencies largest and smallest are:

nlargest() for mostfrequent 'n' values
nsmallest() for least frequent 'n' values

edited May 03 '20 at 07:01

William Prigol Lopes

1,671
13
25

answered May 02 '20 at 20:00

avineet07

31
5

score 2 · Answer 12 · answered Feb 02 '18 at 20:24

2

You could use .apply and pd.value_counts to get a count the occurrence of all the names in the name column.

dataframe['name'].apply(pd.value_counts)

answered Feb 02 '18 at 20:24

Brian

1,915
1
11
25

score 2 · Answer 13 · answered Jul 30 '19 at 05:41

2

To get the top five most common names:

dataframe['name'].value_counts().head()

answered Jul 30 '19 at 05:41

pedro_bb7

996
2
8
22

score 2 · Answer 14 · answered Jan 30 '20 at 15:13

2

my best solution to get the first is

 df['my_column'].value_counts().sort_values(ascending=False).argmax()

answered Jan 30 '20 at 15:13

venergiac

7,051
2
42
69

score 2 · Answer 15 · answered Mar 12 '21 at 14:50

2

I had a similar issue best most compact answer to get lets say the top n (5 is default) most frequent values is:

df["column_name"].value_counts().head(n)

answered Mar 12 '21 at 14:50

KZiovas

1,878
8
22

score 2 · Answer 16 · answered Jun 18 '21 at 16:53

2

Identifying the top 5, for example, using value_counts

top5 = df['column'].value_counts()

Listing contents of 'top_5'

top5[:5]

answered Jun 18 '21 at 16:53

Victor Senna

33
4

1

The one liner for this is: `df['column'].value_counts()[:5]` – Duc Hiep Hoang Jun 22 '21 at 16:34
The above may give you a `KeyError`. The more general way is `top5.keys()[:5]` The one-liner being `df['column'].value_counts().keys()[:5]` – Nirjhor Chakraborty Jul 02 '21 at 21:30

score 1 · Answer 17 · edited Dec 16 '20 at 14:34

1

n is used to get the number of top frequent used items

n = 2

a=dataframe['name'].value_counts()[:n].index.tolist()

dataframe["name"].value_counts()[a]

edited Dec 16 '20 at 14:34

Maylo

592
5
16

answered Dec 16 '20 at 14:10

Hassan Butt

9
1

score 0 · Answer 18 · edited Aug 11 '21 at 19:36

0

Getting top 5 most common lastname pandas:

df['name'].apply(lambda name: name.split()[-1]).value_counts()[:5]

edited Aug 11 '21 at 19:36

General Grievance

4,259
21
28
43

answered Aug 11 '21 at 15:34

Alireza

1

Pandas get the most frequent values of a column

18 Answers18

n is used to get the number of top frequent used items

Linked

Related