In the attached image, the x-axis is the proportion of children at a school who passed at least 5 GCSE exams that year, and the y-axis is a count of the schools. The different colours are whether the school is in a rural or urban area.
I want to do a regression to see how the school being in a rural vs. urban area affects this measure of attainment: 5_GCSEs ~ rural.
I've read that if your dependent variable is a proportion (as mine is), it can be advisable to convert this into a count of successes out of the total number of trials (i.e. # pupils obtaining 5 GCSEs / # pupils sitting GCSEs) for each school and run a binomial logistic regression. However, you need to look at the data distribution first to determine if this is necessary.
This looks quite normally distributed other than the second small peak around 1. Does this mean I can just use linear regression?
I've also attached the residual plots and QQ plots. However, as the only independent variable is binary, I'm unsure how to interpret these.
Grateful for any suggestions about what the distribution of my data suggests about the type of regression analysis I should do.



Second, for proportions, I like beta regression.
– Peter Flom Aug 05 '23 at 13:59