I'm following this tutorial to fit a logistic regression model on to my data which has a binary response. I've understood the reasoning behind each step, apart from why the author checks the first 5 probabilities within glm.probs and observes they are close to 50% and so sets glm.pred to the following:
glm.pred <- ifelse(glm.probs > 0.5, "Up", "Down")
Lets say you observe the first 5 or even take the mean of probabilities and you obtain a value of 0.25. Does that now mean you set glm.pred to:
glm.pred <- ifelse(glm.probs > 0.25, "Up", "Down")
Also another question I have is regarding the removal of variables. From my understanding, a variable with a p-value > 0.05 means we cannot conclude the effect the variable has on the model. Some say you shouldn't remove the variable itself as you lose important information. So how do we then try to improve the model?