1

I am searching for a data set, which contains plain text emails, in order to do a classification of spam or not spam emails.

I this data set but it does not contain the text of the emails. I also found the enron email data set, but I found very weird formatting in the emails, which probably no one would enter manually when writing a mail. Not only html tags. HTML tags would be fine I guess, I could probably simply remove them with a regex and then have cleaner data.

Which email data set has plain text without artificially added weird formatting and is recommendable for learning spam classification?

  • ? did you see http://opendata.stackexchange.com/a/3804/1511 or http://opendata.stackexchange.com/q/4517/1511 – philshem Nov 20 '16 at 19:06

0 Answers0