I built a model with 2 variables, i found that the predicted values for the test set did not align with their real values, so i predicted instead with the data which were used to train the model. I had the following results: 
I don't have a lot of experience in ML, but this shouldn't be like this. The predicted values should sit around the red line.
What could be the source of it ? Is it that my model is unsuficcient in terms of information brought by the predictors ?
It is not an error of programming nor an error with the data as i have already looked inside of it.


The consequence would be: If the response variable has a normal distribution, it is useless to try to predict anything else than the mean. Is that it ?
– Renaud Bied-charreton Aug 13 '23 at 21:41But why does this effect seems to disappear with overfitted model ? (60 variables instead of 2, for 200 individuals)
– Renaud Bied-charreton Aug 14 '23 at 14:02I can't get to understand how and why it would be RTM in my case. I'm really sorry, i'm just doing by best to understand the source of the underlying problem
https://pubmed.ncbi.nlm.nih.gov/30743311/#:~:text=Regression%20to%20the%20mean%20for%20the%20bivariate%20binomial,an%20inaccurate%20conclusion%20in%20a%20pre-post%20study%20design.
– Renaud Bied-charreton Aug 14 '23 at 14:52Looking more and more at the 3D graphics of X,Y and Predicted, i can see more and more why this regression is a natural consequence, but it is therefore not so good to predict i guess. My final word would that this result is natural, but variable choice is wrong.
– Renaud Bied-charreton Aug 17 '23 at 14:24