experts of the statistics,
I am a newbie student in the machine learning field.
I just started a job to classify set of scientific abstracts into five classes.
The text distribution is as below:Class1: 200
Class2: 950 Class3: 150 Class4: 100 class5: 350I am planning to make a multi-class classifier,
however, I worry about the balance of the number of texts for each class. For example, if I use 100 documents for training each class, the class 2 has relevantly too many testing data. I wanted to get some insightful idea to construct my training/testing set and the reason.Sincerely yours,