2

I am currently browsing through the Enron data set in order to come up with a way to extract labels (true/false) on whether an email has been forwarded or replied to. The task would be to predict whether an email is going to get replied or forwarded (per user).

It appears that this information is not available in the data set so what I want to do it try and post-label the respective emails.

However, as I manually browse some of the folders I am not sure if this data set is actually suitable for what I am trying to do here. I do not know whether the data set actually contains conversation (or parts of conversations) which I can actually reconstruct in order to determine replied and/or forwarded messages.

Does anybody have some experience with this data set or is there another that somebody could recommend me for the task I described?

Stefan Falk
  • 121
  • 2
  • Although the enron corpus is the most famous, here are some other open email datasets https://opendata.stackexchange.com/q/4517/1511 – philshem Feb 08 '18 at 20:35

0 Answers0