I'm trying to solve a problem where the goal is to find an association between children's cortisol values (y) against their mother's weekly cortisol averages (x1 to x6) and gender (z). After conducting model selection strategies in R (all subset regression, backward elimination, etc.), the following two 'optimal' models were found:
y = a0 + a5*x5
and
y = a0 + a1*x1 + a3*x3 + a4*x4 + a5*x5 + a6*x6
For the second model, I found something interesting:
- x3: p-value = 0.16, partial $R^2$ = 37.1%
- x5: p-value = 0.04, partial $R^2$ = 5.5%
(Let's ignore other variables for now – their p-values and partial $R^2$ values were between these two variables.)
[Note: p-value refers to the probability / significance of the variable not being equal to zero; partial $R^2$ to the percent of variation that cannot be explained in a model that doesn't contain the variable]
Now to my question: Why do I see x5 being much more significant to the model than x3, yet dropping x3 from the model will drop my $R^2$ values a lot (from around 20% to about 5%), but not so much for x5? Is the reason collinearity between all the variables in the model (which does exist)? Or is it something else?
Also, the ultimate goal is to find the most important variable describing the response. Would I choose x3 or x5 in this case, and why? Or can such a choice be made?
b, and they share the same p-value. http://stats.stackexchange.com/a/76819/3277. – ttnphns Mar 23 '14 at 06:18