0

I want to use linear regression to predict a continuous int variable using some continuous int predictors and some categorical predictors.The categorical column is not ordered.Consider toy data as:

a <- sample(1000:10000000,60 , replace = T)
b <- sample(1:90 , 60 , replace = T)
c <- c("a#" , "b$" , "c*" ,"d@")
c <- rep(c,15)
d <- sample(1000000:10000000 , 60 , replace = T)
data1 <- as.data.frame(a,b,c,d)
head(data1)

#I want column a , b , d to be numeric,I don't know why they are'nt so:
data1$a <- as.numeric(data1$a)
data1$b <- as.numeric(data1$b)
data1$d <- as.numeric(data1$d)

#since predictor c is categorical i use factor()
myformula <- d ~ a + b + factor(c)

data1.glm <- glm(myformula , family = gaussian("identity") , data=data1)

summary(data1.glm)

Call:
glm(formula = myformula, family = gaussian("identity"), data = data1)

Deviance Residuals: 
 Min       1Q   Median       3Q      Max  
-30.150  -11.637   -0.127   13.564   35.662  

 Coefficients:
        Estimate Std. Error t value Pr(>|t|)   
 (Intercept) 24.17925    7.91775   3.054   0.0035 **
  a            0.16155    0.13370   1.208   0.2322   
  b            0.09038    0.20147   0.449   0.6555   
  factor(c)b$ -8.09679    6.78098  -1.194   0.2377   
  factor(c)c*  3.16806    6.36775   0.498   0.6208   
  factor(c)d@  2.12224    6.75855   0.314   0.7547   
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  (Dispersion parameter for gaussian family taken to be 302.7604)

  Null deviance: 17995  on 59  degrees of freedom
  Residual deviance: 16349  on 54  degrees of freedom
  AIC: 520.73

  Number of Fisher Scoring iterations: 2

I have some questions regardless of the P_Value amount(It's a toy data not my exact data)

1)Have I Done the regression correctly? 2)I can't find the coefficient for a# level in c column. even with the code bellow:

  View(as.data.frame(data1.glm$coefficients))

3)What do the coefficients for categorical predictor exactly mean?I mean if i want to check the response(d column) when (a ,b, c) are respectively (20,2, b$) ,Is it correct to use the code bellow ?

 x <- 20*.16154566 + 24.17924999 + 2*.09037666 - 8.09679131
far
  • 1

1 Answers1

0

I am writing in answer section as I don't have privilege to comment. Since the dimensions differ if data1 <- as.data.frame(a,b,c,d) is used, the data1 isn't actually being created in the first place. So better use data1 <- data.frame(a,b,c,d) and no need to convert c variable as a factor again since it's created as a factor variable already. And about interpretation of coefficients (what for you actually posted this question) TEG has already referred a link.

Enigma
  • 51