I'm very confused about the difference between Information gain and mutual information. to make it even more confusing is that I can find both sources defining them as identical and other which explain their differences:
Sources stating Information gain and Mutual information are the same
- Feature Selection: Information Gain VS Mutual Information
- An introduction to information retrieval: "Show that mutual information and information gain are equivalent", page 285, exercise 13.13.
- It is thus known as the information gain, or more commonly the mutual information between X and Y" --> CS769 Spring 2010 Advanced Natural Language Processing, "Information Theory", lecturer: Xiaojin Zhu
- "Information gain is also called expected mutual information" --> "Feature Selection Methods for Text Classification", Nicolette Nicolosi, http://www.cs.rit.edu/~nan2563/feature_selection.pdf
Sources stating they're different:
- https://math.stackexchange.com/questions/833713/equality-of-information-gain-and-mutual-information
- yang --> "A comparative study on Feature Selection in Text Categorization" --> they are treated separately and mutual information is even discarded because it performs very bad compared to IG
- citing yang --> "An Extensive Empirical Study of Feature Selection Metrics for Text Classification" -- http://www.jmlr.org/papers/volume3/forman03a/forman03a_full.pdf
Sources that are confused
I could still find other sources defending opposite thesis but I think these are enough. Can anyone enlighten me about the real difference / equality of these two measures?