I'm using statsmodels to perform an Ordinary Least Squares regression between two categorical variables and a continuous dependent variable.
My data is structured like this:
Group Model Rate
----------------------
Group A Model 1 1.3
Group B Model 7 0.43
Group B Model 1 0.77
Group G Model 2 3.2
I'm trying to correlate the group and machine variables with the rate outcome.
I've used statsmodels.formula.api.ols to create the model, but after fitting it, the result doesn't seem to contain all values of my categorical variables.
This is how I created and fit the model:
model = statsmodels.formula.api.ols('rates ~ C(models) + C(groups)', data=df)
fitted_model = model.fit()
The result looks good and makes sense, except the missing values. I inspected the result by looking at the fitted_model.params. It lists all levels of my "Group" variable but one, and same for the "Model". It also gives an "Intercept".
I'm guessing my issue is statistical, rather than coding. Is there a reason one level of each categorical variable would be elided? If I'm interested in the effect of those missing levels on my outcome (coefficients, p-values), how can I find that out?