I have a large dataset (n > 500,000) which I'm building a linear model with lm(PV1READ ~ PV1MATH + PV1SCIE + ST004D01T). Tests for Normality, No multicollinearity, Independence seem fine, but it keeps tripping up on Homoscedasticity. The Breusch-Pagan test ols_test_breusch_pagan(mdl) gives:
Chi2 = 930.8976, Prob > Chi2 = 1.88444e-204
I read elsewhere that B-P tests aren't so good on larger datasets (p values go down the more observations I add), but the White's test gives similar results white_test_boot(mdl):
Test Statistic: 1053.69, P-value: 0
I would just kick the model out, but the plots look promising, the lines are pretty straight to me:
But when I plot the Studentized Residuals, a large number of points are |Residual| > 2
sum(abs(stud_resids)>2)/ length(stud_resids)
Which equates to ~5% overall, which seem ok.
Am I missing something? Can I just use the graphs instead of the tests if the dataset is large enough? And how would I report that?

