This question is based on Everitt et al. (A Handbook of Statistical Analyses Using R) and I am trying to answer these questions:
Load the
Defaultdataset fromISLRlibrary. The dataset contains information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt. It is a four-dimensional dataset with 10000 observations. The question of interest is to predict individuals who will default . We want to examine how each predictor variable is related to the response (default). Do the following on this dataset:a) Perform descriptive analysis on the dataset to have an insight. Use summaries and appropriate exploratory graphics to answer the question of interest.
b) Use R to build a logistic regression model.
c) Discuss your result. Which predictor variables were important? Are there interactions?
However, I am more interested in understanding when one should use -1 and the relevance of excluding intercept in a model. Here is the data summary:
# Set up data
data("Default", package = "ISLR")
#create default binary
default_binary <-
ifelse(regexpr('Yes', Default$default) == -1, 0, 1)
dflt_str <-
ifelse(regexpr('Yes', Default$default) == -1,
"Not Defaulted",
"Defaulted")
stdn <- ifelse(regexpr('Yes', Default$student) == -1, 0, 1)
stdn_str <-
ifelse(regexpr('Yes', Default$student) == -1, "Not-Student", "Student")
blnc <- Default$balance
incm <- Default$income
df <-
data.frame(default_binary, dflt_str, stdn, stdn_str, blnc, incm)
# with intercept
fm0 <- default_binary ~ stdn + blnc + incm
# no intercept as indicated by -1
fm1 <- default_binary~-1+stdn+blnc+incm
regression_model_without_minus_1 <- glm(fm0, family = binomial())
regression_model_with_minus_1 <- glm(fm1, family = binomial())
and for summary of the model, I get:
Can someone please explain me the difference between results with -1 and without -1 in these models with merits and drawbacks. Thanks for helping me!