I want to use linear regression to predict a continuous int variable using some continuous int predictors and some categorical predictors.The categorical column is not ordered.Consider toy data as:
a <- sample(1000:10000000,60 , replace = T)
b <- sample(1:90 , 60 , replace = T)
c <- c("a#" , "b$" , "c*" ,"d@")
c <- rep(c,15)
d <- sample(1000000:10000000 , 60 , replace = T)
data1 <- as.data.frame(a,b,c,d)
head(data1)
#I want column a , b , d to be numeric,I don't know why they are'nt so:
data1$a <- as.numeric(data1$a)
data1$b <- as.numeric(data1$b)
data1$d <- as.numeric(data1$d)
#since predictor c is categorical i use factor()
myformula <- d ~ a + b + factor(c)
data1.glm <- glm(myformula , family = gaussian("identity") , data=data1)
summary(data1.glm)
Call:
glm(formula = myformula, family = gaussian("identity"), data = data1)
Deviance Residuals:
Min 1Q Median 3Q Max
-30.150 -11.637 -0.127 13.564 35.662
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.17925 7.91775 3.054 0.0035 **
a 0.16155 0.13370 1.208 0.2322
b 0.09038 0.20147 0.449 0.6555
factor(c)b$ -8.09679 6.78098 -1.194 0.2377
factor(c)c* 3.16806 6.36775 0.498 0.6208
factor(c)d@ 2.12224 6.75855 0.314 0.7547
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 302.7604)
Null deviance: 17995 on 59 degrees of freedom
Residual deviance: 16349 on 54 degrees of freedom
AIC: 520.73
Number of Fisher Scoring iterations: 2
I have some questions regardless of the P_Value amount(It's a toy data not my exact data)
1)Have I Done the regression correctly? 2)I can't find the coefficient for a# level in c column. even with the code bellow:
View(as.data.frame(data1.glm$coefficients))
3)What do the coefficients for categorical predictor exactly mean?I mean if i want to check the response(d column) when (a ,b, c) are respectively (20,2, b$) ,Is it correct to use the code bellow ?
x <- 20*.16154566 + 24.17924999 + 2*.09037666 - 8.09679131