I have plotted the average SAT score of schools on the x-axis and the studentised residuals on the y-axis.
The code I used to produce this looks like this:
library(MASS)
lm.sats <- lm(SATSCR ~ urbanrural)
stud_resids <- studres(lm.sats)
resid_df <- final_df[!is.na(final_df$SATSCR),]
head(stud_resids, 3)
plot(resid_df$SATSCR, stud_resids, ylab = "Studentized Residuals", xlab = "SAT Score")
abline(0,0)
I understand that the data should be evenly distributed above and below that (0,0) line. Is anyone able to suggest why my residuals are so strongly correlated with the SAT Score?
For further information, this is what my boxplot looks like with the SAT Scores on the x-axis looks like. The blue box is rural and the red box is urban.
And this is what the histogram looks like. Again, the blue is rural and the red is urban. Number of schools is on the y-axis and average SAT score is on the x-axis.


