Pandas find how many times a column value appears in dataset

Question

I am trying to sort data by the Name column, by popularity.

Right now, I'm doing this:

df['Count'] = df.apply(lambda x: len(df[df['Name'] == x['Name']]), axis=1)
df[df['Count'] > 50][['Name', 'Description', 'Count']].drop_duplicates('Name').sort_values('Count', ascending=False).head(100)

However this query is very slow, it takes hours to run.

What would be a more efficient way to do this?

http://stackoverflow.com/questions/22391433/count-the-frequency-that-a-value-occurs-in-a-dataframe-column — Lynob, Jul 20 '16 at 17:55

score 2 · Accepted Answer · answered Jul 24 '16 at 22:06

2

The solution I have been looking for is:

df['Count'] = df.groupby('Name')['Name'].transform('count')

Big thanks to @Lynob for providing a link with an answer.

answered Jul 24 '16 at 22:06

if __name__ is None

10,423
17
53
69

score 1 · Answer 2 · answered Jul 20 '16 at 17:56

1

You can use Series.value_counts.

df = pd.DataFrame([[0, 1], [1, 0], [1, 1]], columns=['a', 'b'])
print(df['b'].value_counts())

outputs

1    2
0    1
Name: b, dtype: int64

answered Jul 20 '16 at 17:56

Alex

17,062
7
54
78

Right, but I need the other fields from df as well ('Name', 'Description'). `value_counts` omits those. – if __name__ is None Jul 22 '16 at 23:02

score 0 · Answer 3 · answered Jul 20 '16 at 18:16

0

Try this:

a = ["jim"]*5  + ["jane"]*10 + ["john"]*15 
n = pd.Series(a)

sorted((n.value_counts()[n.value_counts() > 5]).index)

['jane', 'john']

answered Jul 20 '16 at 18:16

Merlin

22,195
35
117
197

I would still like to know the fields like 'Name', 'Description' from my df. So I guess, what would be a way to apply `value_counts()` to my `df.apply` method to create new column called 'Counts'? – if __name__ is None Jul 22 '16 at 23:09
You need to provide some data or example dataframe. – Merlin Jul 22 '16 at 23:29
I made sample data answer beacuse comments are impossible to work with. – if __name__ is None Jul 23 '16 at 01:59

Pandas find how many times a column value appears in dataset

3 Answers3