1

What can I do when the p-value of my regression is not significant? I tried to transform it with log or sqrt, it improved a little, but not enough to go below the 5%.

The residuals follow a Normal distribution and there is no strong correlation between the variables.

enter image description here

Does it mean the linear regression just doesn't fit my data or, because the p-value is close to the threshold, I can ignore it?

Edit : I did a cumulative coding because there were lots of categorical and ordinal variable, that explain the difference between the R^2 and ajusted R^2, see below the functionnement (not my data, just for the explanation)

insignificant

Forzan
  • 13
  • 10
    You are approaching this with the wrong mindset. Your goal is not creating a significant p-value. – Roland Apr 25 '22 at 10:36
  • 4
    I agree with @Roland. Further, it looks like you're estimating a model with 15 parameters and 24 observations - that's asking the data to do an awful lot. You see that reflected in the huge difference between the multiple R-squared and the Adjusted R-squared. – Dave Armstrong Apr 25 '22 at 10:44
  • I completely agree with the prior comments. // What is yo ur goal is doing the logarithm or square root transformation? – Dave Apr 25 '22 at 11:37
  • @Roalnd For me, my goal is to make a regression that fit the best my data and the results I have (R^2 = 0.8) seems good but because the p_value is below 5% doesn't it means that the model, taken as a whole, is not relevant ? – Forzan Apr 25 '22 at 12:04
  • @ Dave About doing the log / square tranformation, to be honest, it was just to see if it could improve my model and the F test. – Forzan Apr 25 '22 at 12:07
  • 1
    A few things. 1. The insignificant F-statistic means that all model parameters are not jointly different from zero - that all non-zero estimates could have arisen by chance rather than a systematic relationship that you've uncovered. I wouldn't pay as much attention to the R-squared value as the difference between the R-squared and adjusted R-squared values, which is huge - indicating several "irrelevant" variables in the model. – Dave Armstrong Apr 25 '22 at 12:10
  • Logs and square roots of the dependent variable are non-linear transformations that can help linearize particular types of relationships. Specifically, those that rapidly decrease in x and then level off or those that look like a mirror image of that - they increase slowly at first and then more rapidly as x increases. If you don't have that relationship between y and your x-variables this kind of transformation won't help.
  • – Dave Armstrong Apr 25 '22 at 12:12
  • @DaveArmstrong I see and that's what I want to avoid. Is there things I can do in order to change that (like the transformation I did) ? – Forzan Apr 25 '22 at 12:15
  • @Dave My bad, I changed it ! – Forzan Apr 25 '22 at 12:18
  • 1
    Do you know the difference between $R^2$ and adjusted $R^2?$ – Dave Apr 25 '22 at 12:23
  • @Dave kind of, I know its formula and the fact that is can be useful to check whether a variable add something to the model or not because the R^2 goes up even if the variable add no information – Forzan Apr 25 '22 at 12:47
  • yeah, I understand now, I have 8 categorical variable that I coded cumulative (add n-1 variable for each modality of each categorical variable) – Forzan Apr 25 '22 at 12:59
  • Even though I did a stupid mistake by forgetting that, I really want to know what to do when the F-statistic is insignificant, couldn't find anything on the web. – Forzan Apr 25 '22 at 13:03
  • 2
    With only 24 observations you should only be trying to fit about 2 predictors; otherwise you run risks of overfitting or even of missing true associations with outcome because you don't have enough data to support a "significant" F-test. See Frank Harrells course notes on Regression Modeling Strategies, especially Chapter 4. You should step back and think about the specific question you want to answer and whether you have enough data to answer it reliably. – EdM Apr 25 '22 at 13:40
  • @EdM I would do that if I could, but I'm kind of obliged to find a way of doing with these variables. But yeah, the fact I don't have lots of data is the problem. – Forzan Apr 25 '22 at 14:31
  • Obliged by whom? And why? – Dave Apr 25 '22 at 14:34
  • I'm in an internship and have no other variables to work with. – Forzan Apr 25 '22 at 14:45
  • 3
    So why are you obliged to pursue a bad regression strategy? – Dave Apr 25 '22 at 14:47
  • You make a good point – Forzan Apr 25 '22 at 14:51
  • But if I could make it work, would be great. – Forzan Apr 25 '22 at 14:57
  • I don't believe your output, because it's inconsistent with your explanation and your data. Regressing $Y$ against $D2$ and $D3$ I find $D2$ is significant, the overall regression p-value is a tiny $6\times 10^{-11},$ and the adjusted $R^2$ is above $0.81.$ Are you showing all your data? Exactly what is your regression model? – whuber Apr 25 '22 at 15:39
  • @whuber Oh no sorry, the capture was just here to explain what is the cumulative coding, not my real data – Forzan Apr 25 '22 at 15:58