91

i have this dataframe:

0 name data
1 alex asd
2 helen sdd
3 alex dss
4 helen sdsd
5 john sdadd

so i am trying to get the most frequent value or values(in this case its values) so what i do is:

dataframe['name'].value_counts().idxmax()

but it returns only the value: Alex even if it Helen appears two times as well.

aleale
  • 951
  • 1
  • 6
  • 11

18 Answers18

114

By using mode

df.name.mode()
Out[712]: 
0     alex
1    helen
dtype: object
BENY
  • 296,997
  • 19
  • 147
  • 204
67

To get the n most frequent values, just subset .value_counts() and grab the index:

# get top 10 most frequent names
n = 10
dataframe['name'].value_counts()[:n].index.tolist()
Jared Wilber
  • 4,775
  • 29
  • 34
  • 1
    What exactly does adding .index does? Why can't I leave it till [:n]? – user1953366 Apr 28 '19 at 07:10
  • 1
    The returned data structure will have the `name` values stored in the index, with their respective counts stored as the value. So if you didn't use index, you'd get a list of the most frequent counts, not the associated `name`. – Jared Wilber Apr 28 '19 at 18:15
  • 1
    Great this works. But need to find the top n for all the columns in one go. and store them in n columns. So dataframe will have {colmn_name, mode_1, mode_2...mode_n} – Vikrant Dec 02 '19 at 09:42
18

You could try argmax like this:

dataframe['name'].value_counts().argmax() Out[13]: 'alex'

The value_counts will return a count object of pandas.core.series.Series and argmax could be used to achieve the key of max values.

Lunar_one
  • 329
  • 2
  • 4
9
df['name'].value_counts()[:5].sort_values(ascending=False)

The value_counts will return a count object of pandas.core.series.Series and sort_values(ascending=False) will get you the highest values first.

Taie
  • 609
  • 9
  • 22
  • 1
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – xiawi Sep 11 '19 at 08:57
  • `value_counts()` already returns a sort in descending order, so calling `sort_values()` is unnecessary. See [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.value_counts.html). – Matt VanEseltine Oct 20 '20 at 21:02
7

You can use this to get a perfect count, it calculates the mode a particular column

df['name'].value_counts()
paul okoduwa
  • 71
  • 1
  • 1
6

Here's one way:

df['name'].value_counts()[df['name'].value_counts() == df['name'].value_counts().max()]

which prints:

helen    2
alex     2
Name: name, dtype: int64
pault
  • 37,170
  • 13
  • 92
  • 132
6

Use:

df['name'].mode()

or

df['name'].value_counts().idxmax()
Mohit Mehlawat
  • 101
  • 1
  • 2
4

Not Obvious, But Fast

f, u = pd.factorize(df.name.values)
counts = np.bincount(f)
u[counts == counts.max()]

array(['alex', 'helen'], dtype=object)
piRSquared
  • 265,629
  • 48
  • 427
  • 571
4

to get top 5:

dataframe['name'].value_counts()[0:5]
Naomi Fridman
  • 1,861
  • 1
  • 24
  • 34
  • 2
    I actually like this answer, but there is one issue. Doing this just returns the frequency, not the label. Fix this by using ```dataframe['name'].value_counts().keys()[0:5]``` instead. –  Jul 25 '19 at 17:32
4

It will give top five most common names:

df['name'].value_counts().nlargest(5)
Syscall
  • 18,131
  • 10
  • 32
  • 49
3

Simply use this..

dataframe['name'].value_counts().nlargest(n)

The functions for frequencies largest and smallest are:

  • nlargest() for mostfrequent 'n' values
  • nsmallest() for least frequent 'n' values
William Prigol Lopes
  • 1,671
  • 13
  • 25
avineet07
  • 31
  • 5
2

You could use .apply and pd.value_counts to get a count the occurrence of all the names in the name column.

dataframe['name'].apply(pd.value_counts)
Brian
  • 1,915
  • 1
  • 11
  • 25
2

To get the top five most common names:

dataframe['name'].value_counts().head()
pedro_bb7
  • 996
  • 2
  • 8
  • 22
2

my best solution to get the first is

 df['my_column'].value_counts().sort_values(ascending=False).argmax()
venergiac
  • 7,051
  • 2
  • 42
  • 69
2

I had a similar issue best most compact answer to get lets say the top n (5 is default) most frequent values is:

df["column_name"].value_counts().head(n)
KZiovas
  • 1,878
  • 8
  • 22
2

Identifying the top 5, for example, using value_counts

top5 = df['column'].value_counts()

Listing contents of 'top_5'

top5[:5]
1

n is used to get the number of top frequent used items

n = 2

a=dataframe['name'].value_counts()[:n].index.tolist()

dataframe["name"].value_counts()[a]
Maylo
  • 592
  • 5
  • 16
0

Getting top 5 most common lastname pandas:

df['name'].apply(lambda name: name.split()[-1]).value_counts()[:5]
General Grievance
  • 4,259
  • 21
  • 28
  • 43