0

I have a classification problem with the following example independent features:

recommendations comment_count comment.
0.663 . 0.382 'yes', 'trump'

The dependent variable is whether the comment is likely to receive a reply or not:

get_reply
0.

I want to apply regularisation to a the logistic regression model but I can't decide between L1 and L2.

I want to do this for three different datasets, one for online comments on sports articles, one for magazine and one for politics(national).

I then want to interpret the top e.g. 10 largest coefficients from these models. The following diagrams show this.

The first diagram is with the L1 penalty(has a test f1-score of 0.85): enter image description here The second diagram is with a L2 penalty(has a test f1-score of 0.60): enter image description here

I am struggling to decide between the two models, and which would create a more interesting discussion. I understand the L2 diagram more, such that comments in the magazine with a number of recommendations is likely to receive a reply. So I'm favoring L2, but the diagram of L1 offers more interesting text words that appeared in the comments.

I aim to identify features that vary across the different news groupings, sports, politics, and magazine. To point out similarities or differences that could be of importance.

Holly
  • 1
  • 1
    Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Mar 29 '22 at 11:58
  • What is your purpose? If you only want to classify how does test set performance compare? Why interpretability is important? – Tim Mar 29 '22 at 12:30
  • @Tim interpretability is important cause I want to point how the coefficients range across the news topics – Holly Mar 29 '22 at 12:40
  • @Tim the f1-score I added was for the test data. I updated the question. As L1 handles outliers, I think that removes the important features, like number of recommendations, therefore I am thinking l2 is better – Holly Mar 29 '22 at 12:41
  • How do you know that this feature is truly important? If you care about interpretability, why use regularization at all? – Tim Mar 29 '22 at 12:52
  • @Tim should I not try use regularization to reduce overfitting/underfitting and so on? I was considering the features with the highest coefficent value negative or positive to be important, as it as more in influence in predicting the dependent variable – Holly Mar 29 '22 at 12:56
  • Do you have a reason to believe that the model without regularization overfitted? With regularization, you get biased estimates for the parameters so it makes interpreting them harder. How much data do you have? – Tim Mar 29 '22 at 13:01
  • It was finding it hard to tell if it was overfitting, so I just assumed based on the amount of data. For one news grouping, e.g. magazine, there is 21175 entries. So 21175 comments, but a comment could have around 10 to 50 words if not more. Would I be better of disregarding regularization? – Holly Mar 29 '22 at 13:05
  • I believe L1 is useful when you are interested in the most important features, which is true for my case. But I also believe it removes outliers, which can be seen in the plot as it removes some of the features I engineered, e.g. comment count and recommendations. i would be tempted to go with L1 so it removes all the less important features, but I dont want it disregard the features I engineered which are influential based on the L2 plot – Holly Mar 29 '22 at 13:14
  • Many similar Qs here, for instance https://stats.stackexchange.com/questions/184019/when-will-l1-regularization-work-better-than-l2-and-vice-versa/184023#184023 – kjetil b halvorsen Mar 30 '22 at 01:55

0 Answers0