3

I want to determine the impact of competition on quality using linear regression. Competition is represented by the Hirschmann-Index (HHI), which is an indicator for market concentration. It can take values in a range from 0 up to 1, where 1 represents a monopoly. The indicator that should represent the quality is a percentage value.

Now, my question is how do I have to take in account that the HHI is a limited value (0-1)? Are there any restrictions or modifications that have to be made?

Sam
  • 95
  • 1
    That sounds like a job for beta regression: https://rcompanion.org/handbook/J_02.html – Christoph Hanck Jun 23 '22 at 10:20
  • Thanks for the fast response. On the website they talk about dependent variables. I have found a lot of information regarding proportion dependent variables, but what about the independents? – Sam Jun 23 '22 at 10:27
  • It sounds like your dependent variable (quality) is limited as well because it is a percentage. I am assuming Quality won't take a percentage value outside limits of 0 and 100 (like 110% or -15% quality). If this is the case, you may want to look into Censored Regression Models like the Tobit Model. – Spur Economics Jun 23 '22 at 10:29
  • That absolutely correct. Also the dependent variable is limited, but I thought I could interpret it as log-level model. Does it makes sense? An alternative would be to scale the HHI (HHI x 100) to get percentage as well. Is this correct? – Sam Jun 23 '22 at 10:38
  • OK, I missed the part about it being the independent. For independents, I see less of an issue, much like dummies can be incorporated without further ado as explanatory variables. – Christoph Hanck Jun 23 '22 at 12:05
  • Have a look at https://stats.stackexchange.com/questions/216122/what-is-the-difference-between-logistic-regression-and-fractional-response-regre – kjetil b halvorsen Jun 23 '22 at 12:39

1 Answers1

0

A common misconception about regression is that there are (normal) distribution assumptions about the features, which there are not.

I might be concerned about a nonlinear relationship with the outcome or interactions with other features, but these are always possibilities. Methods to remedy these issues range from domain knowledge to flexible models with splines to neural networks that figure out the nonlinear relationships and interactions.

Dave
  • 62,186