1

Let's say I have 3 features for a regression model: if_smoking, if_drinking, and body_height. The first 2 are binary, while the 3rd is continuous. I have coefficients like:

bias/y_intercept: 1.2

coefficient for if_smoking: 0.8

coefficient for if_drinking: 0.5

coefficient for body_height: 0.2

The model, therefore, should be: y_predict = 1.2 + 0.8*if_smoking + 0.5*if_drinking + 0.2*body_height

I can say that if_smoking is more important than if_drinking since the former's coefficient is 0.8 over the latter's 0.5, and both are binary (0 or 1).

However, is body_height more or less important than if_smoking and if_drinking? If I look at the coefficient, it's 0.2, less than if_smoking's 0.8 and if_drinking's 0.5. However, body_height is continuous. Let's say a person's height is 5 feet, 0.2*5 is greater than if_smoking's 0.8*1 and if_drinking's 0.5*1. In other words, body_height's coefficient (0.2) is smaller but its overall contribution (0.2*5=1) is larger.

So, do I say body_height is less or more important than if_smoking and if_drinking?

  • Please explain what you mean by "important." There are several natural interpretations but it is not evident which you might have in mind. – whuber Aug 06 '22 at 19:54
  • @whuber Thanks for asking. I basically referred to the feature importance - e.g., it sounds like, as a feature, if_smoking is more important than if_drinking, but not sure how to interpret the importance when there's a mix of binary and continuous features. – Fred Chang Aug 06 '22 at 22:43
  • 1
    Your comment provides no further clarification about what you mean by "importance". – dipetkov Aug 07 '22 at 11:24
  • @dipetkov Does this make sense: relative magnitude of the effects of different independent variables – Fred Chang Aug 07 '22 at 17:38
  • One aspect that can make quite a bit of difference: Do you want to estimate "importance" for each predictor conditional or unconditional on the other predictors? – dipetkov Aug 07 '22 at 17:42
  • @dipetkov I would say unconditional, i.e., all predictors are independent. – Fred Chang Aug 21 '22 at 19:38

1 Answers1

0

I presume that with "importance" you have something similar to the effect size of the various independent variables (IV) in mind. I.e. you wonder how large the change in the dependent variable (DV) y_predict is when changing one of the IVs by "a unit".

Then, the fitted coefficient of an IV tells you how much y_predict changes by changing the IV by one unit, and keeping all the other IVs fixed. E.g., if you change the IV if_smoking from zero to one and keep all the other IVs fixed, the DV y_predict will on average increase by 0.8.

Similarly, the coefficient 0.2 of body_height means that an increase in body height by one unit, i.e. one foot, would lead to y_predict increasing by 0.2. But the unit foot is quite huge for body height. A more reasonable change of e.g. 0.1 feet would result in an average change of y_predict of only 0.02. Thus, I would argue that the body height is, in comparison to the other IVs, not very "important", i.e. doesn't have as large an effect on y_predict.

Summary: Always mind your units. The coefficients can only be interpreted with the units in mind. If you had e.g. used the unit millimeter, the coefficient would have increased to 60.96, for exactly the same data and thus the same "importance" (effect).

frank
  • 10,797
  • The effect size, by definition, uses units determined by the spreads of the data rather than arbitrary anthropocentric units like feet. You seem to describe the estimated coefficients, which -- because they are potentially based on different units of measurement -- aren't even comparable. – whuber Aug 07 '22 at 13:00
  • @whuber I was referring to effect size in the sense of my posted link, which puts it very loosely. If there is a standard definition of effect size, could you share a link? – frank Aug 07 '22 at 13:47
  • Cohen, Jacob. 1988. Statistical Power Analysis for the Behavioral Sciences. Second. New York: Routledge. https://doi.org/10.4324/9780203771587. – whuber Aug 07 '22 at 15:41
  • @frank Thanks for the input. Yes, I agreed that the units were critical to interpret the coefficients. I did see something called "standardized coefficients", which used the standard deviation units. However, I have both continuous and binary features. Do I standardize the binary features together with the continuous features? Or do I have to compare continuous features and binary features separately? – Fred Chang Aug 07 '22 at 17:37
  • I don't know what you refer to by "standardized coefficients". I just think that a straightforward approach would be that you choose a reasonable change $c_b$ for body_height, compute the belonging change $c_y = 0.2 * c_b$ in y_predict and then report the pair $(c_b, c_y)$, together with the belonging units, and then leave it to the audience to decide whether they think this is more or less "important" than the change in y_predict when e.g. changing the variable if_smoking. – frank Aug 07 '22 at 19:11