2

I have a collection outcome resulting from the function:

Counter(df.email_address)

it returns each individual email address with the count of its repetitions.

Counter({nan: 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1})

what I want to do is to use it as if it was a dictionary and create a pandas dataframe out of it with two columns one for email addresses and one for the value associated.

I tried with:

dfr = repeaters.from_dict(repeaters, orient='index')

but i got the following error:

AttributeError: 'Counter' object has no attribute 'from_dict'

It makes thing that Counter is not a dictionary as it looks like. Any idea on how to append it to a df?

Blue Moon
  • 3,899
  • 14
  • 47
  • 82
  • 3
    `from_dict` is a class method of DataFrames, not dictionaries/Counters. You could try: `dfr = pd.DataFrame.from_dict(repeaters, orient='index')` – Alex Riley Aug 04 '15 at 11:22
  • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html – Blue Moon Aug 04 '15 at 11:22
  • @ajcr , I was just going to answer that. – omri_saadon Aug 04 '15 at 11:28
  • @omri_saadon: do feel free to provide an answer if you'd like; comments are generally less useful so I'm happy to delete mine if an answer appears. – Alex Riley Aug 04 '15 at 11:31
  • 2
    A counter is a subclass of dict and can be turned into a regular dict with dict(counter), see https://docs.python.org/3/library/collections.html#collections.Counter –  Aug 04 '15 at 11:54
  • Why don't you just use `df.email_address.value_counts()`? – EdChum Aug 04 '15 at 12:00

3 Answers3

19
d = {}
cnt = Counter(df.email_address)
for key, value in cnt.items():
    d[key] = value

EDIT

Or, how @Trif Nefzger suggested:

d = dict(Counter(df.email_address))
doru
  • 8,612
  • 2
  • 31
  • 43
2

as ajcr wrote at the comment, from_dict is a method that belongs to dataframe and thus you can write the following to achieve your goal:

from collections import Counter
import pandas as pd

repeaters = Counter({"nan": 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1})

dfr = pd.DataFrame.from_dict(repeaters, orient='index')
print dfr

Output:

testorders@worldstores.co.uk     1
nan                           1618
store@kiddicare.com            265
omri_saadon
  • 9,513
  • 6
  • 31
  • 58
0

Alternatively you could use pd.Series.value_counts, which returns a Series object.

df.email_address.value_counts(dropna=False)

Sample output:

b@y.com    2
a@x.com    1
NaN        1
dtype: int64

This is not exactly what you asked for but looks like what you'd like to achieve.

ldirer
  • 6,186
  • 3
  • 22
  • 30