I am using the titanic_train data set in R to build a logistic regression model.
library(titanic)
library(splines)
library(broom)
library(dplyr)
#First we process the titanic_train dataset to make it
a little more logistic regression friendly
titanic <- titanic_train %>%
select(Survived, Pclass, Sex, Age, SibSp, Parch, Fare) %>%
mutate(Survived = factor(Survived),
Pclass = factor(Pclass),
Sex = factor(Sex))
I'll skip over the guts of the project and just say this:
for the Age predictor I used a natural cubic spline via the ns() function and ended up with this model:
model_02 <- glm(Survived ~ SibSp + ns(Age, df = 3) + Pclass + Parch + Fare,
data = titanic,
family = binomial)
model_02 <- step(model_02, trace = FALSE)
summary(model_02)
When I run the summary, I look at the coefficients:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.20072 0.55223 5.80 6.8e-09 ***
SibSp -0.37579 0.11649 -3.23 0.0013 **
ns(Age, df = 3)1 -1.29087 0.47188 -2.74 0.0062 **
ns(Age, df = 3)2 -6.53137 1.08099 -6.04 1.5e-09 ***
ns(Age, df = 3)3 -4.10291 0.89387 -4.59 4.4e-06 ***
Pclass2 -0.75677 0.28842 -2.62 0.0087 **
Pclass3 -1.98663 0.30651 -6.48 9.1e-11 ***
Fare 0.00631 0.00291 2.17 0.0303 *
Now I know that the coefficients represent the change in the log odds of the outcome (surviving the titanic crash in this case) but I don't know how to interpret the coefficients associated with the age spline.
My first thought is that they are the coefficients of the polynomial, but that does not seem accurate because they function:
-4.1*Age^3 - 6.53*Age^2 - 1.29*Age does not even come close to tracking the log odds versus age.
I'm really lost on this one and it's really frustrating, can anyone help?



as.integer(titanic$Sex == "male")– Jan 07 '21 at 22:15