I'm working on a project and need resources to get me up to speed.
The dataset is around 35000 observations on 30 or so variables. About half the variables are categorical with some having many different possible values, i.e. if you split the categorical variables into dummy variables you would have a lot more than 30 variables. But still probably on the order of a couple of hundred max. (n>p).
The response we want to predict is ordinal with 5 levels (1,2,3,4,5). Predictors are a mix of continuous and categorical, about half of each. These are my thoughts/plans so far: 1. Treat the response as continuous and run vanilla linear regression. 2. Run nominal and ordinal logistic and probit regression 3. Use MARS and/or another flavor of non-linear regression
I'm familiar with linear regression. MARS is well enough described by Hastie and Tibshirani. But I'm at a loss when it comes to ordinal logit/probit, especially with so many variables and a big data set.
The r package glmnetcr seems to be my best bet so far, but the documentation hardly suffices to get me where I need to be.
Where can I go to learn more?