I am relatively new to R, and I am trying to fit a model to data that consists of a categorical column and a numeric (integer) column. The dependent variable is a continuous number.
The data has the following format:
predCateg, predIntNum, ResponseVar
The data looks something like this:
ranking, age_in_years, wealth_indicator
category_A, 99, 1234.56
category_A, 21, 12.34
category_A, 42, 234.56
....
category_N, 105, 77.27
How would I model this (presumably, using a GLM), in R?
[[Edit]]
It has just occurred to me (after analysing the data more thoroughly), that the categorical independent variable is in fact ordered. I have therefore modified the answer provided earlier as follows:
> fit2 <- glm(wealth_indicator ~ ordered(ranking) + age_in_years, data=amort2)
>
> fit2
Call: glm(formula = wealth_indicator ~ ordered(ranking) + age_in_years,
data = amort2)
Coefficients:
(Intercept) ordered(ranking).L ordered(ranking).Q ordered(ranking).C age_in_years
0.0578500 -0.0055454 -0.0013000 0.0007603 0.0036818
Degrees of Freedom: 39 Total (i.e. Null); 35 Residual
Null Deviance: 0.004924
Residual Deviance: 0.00012 AIC: -383.2
>
> fit3 <- glm(wealth_indicator ~ ordered(ranking) + age_in_years + ordered(ranking)*age_in_years, data=amort2)
> fit3
Call: glm(formula = wealth_indicator ~ ordered(ranking) + age_in_years +
ordered(ranking) * age_in_years, data = amort2)
Coefficients:
(Intercept) ordered(ranking).L ordered(ranking).Q
0.0578500 -0.0018932 -0.0039667
ordered(ranking).C age_in_years ordered(ranking).L:age_in_years
0.0021019 0.0036818 -0.0006640
ordered(ranking).Q:age_in_years ordered(ranking).C:age_in_years
0.0004848 -0.0002439
Degrees of Freedom: 39 Total (i.e. Null); 32 Residual
Null Deviance: 0.004924
Residual Deviance: 5.931e-05 AIC: -405.4
I am a bit confused by what ordered(ranking).C, ordered(ranking).Q and ordered(ranking).L mean in the output, and would appreciate some help in understanding this output, and how to use it to predict the response variable.
factor(ranking)and notas.factor(ranking)? – Peter Flom Mar 03 '14 at 18:26factor(x)so that I can include thelevelsargument if I wish. You could also useas.factor(x)if you wish, and it may in fact be faster, but I would think you'd need quite a large dataset for the speed of these functions to matter. – P Schnell Mar 03 '14 at 18:35ordered(ranking).C,ordered(ranking).Qandordered(ranking).L- what do they mean, and how do I use that to predict the response variable?) - any help will be much appreciated. Thanks – Homunculus Reticulli Mar 04 '14 at 05:36.L,.Q, and.Care, respectively, the coefficients for the ordered factor coded with linear, quadratic, and cubic contrasts. The commandcontr.poly(4)will show you the contrast matrix for an ordered factor with 4 levels (3 degrees of freedom, which is why you get up to a third order polynomial).contr.poly(4)[2, '.L']will tell you what to plug in for the second ordered level in the linear term. Be aware that this assumes that it makes sense to consider the levels as equally spaced. If it doesn't, code your own contrast matrix. – P Schnell Mar 04 '14 at 13:26ordered(ranking)comparingfactor(ranking)? – ah bon Mar 21 '20 at 05:11