1

I have a list of names:

lst = ['Albert', 'Carl', 'Julian', 'Mary']

and I have a DF:

target     id      name
  A       100     Albert
  A       110     Albert
  B       200     Carl
  D       500     Mary
  E       235     Mary

I want to make another dataframe counting how many id per name in lst:

lst_names   Count
Albert       2
Carl         1
Julian       0
Mary         2

What's the most efficient way to do this considering the list of names has 12k unique names on it?

Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
pouchewar
  • 363
  • 8

2 Answers2

3

Check with value_counts

pd.Categorical(df['name'],lst).value_counts()
Out[894]: 
Albert    2
Carl      1
Julian    0
Mary      2
dtype: int64

Or

df['name'].value_counts().reindex(lst,fill_value=0)
Out[896]: 
Albert    2
Carl      1
Julian    0
Mary      2
Name: name, dtype: int64
BENY
  • 296,997
  • 19
  • 147
  • 204
2

You can use value_counts, and then create an empty Series with lst as the index, and then add them together, filling NaN with 0:

(df['name'].value_counts() + pd.Series(index=lst, dtype=int)).fillna(0).astype(int)

Output:

>>> df
Albert    2
Carl      1
Julian    0
Mary      2
Name: count, dtype: int64
richardec
  • 14,202
  • 6
  • 23
  • 49