1

I'm looking at whether high school start times influence test scores. Currently, I have the individual school start times and the average test scores for around 40 different high schools for each academic school year between 2012-2019.

This is for a school project and I don't have much experience with statistics and am relying on excel.

I was wondering, is there a way to see whether changes in school start times over a period of time would correlate with changes in test scores over that certain period of time?

I know that if I was just connecting a start time with its corresponding test scores, I can just have two columns of data (one for start times and another for test scores) and perform a regression analysis between the two columns, I think. But this loses out on the factor of time, is there a way to also include time within this analysis?

Updated w/ Link to Spreadsheet: https://epscloud-my.sharepoint.com/:x:/g/personal/198890_apps_everettsd_org/EUocdmeDXclOt9mkXw-DcdwB3FbIU9t4WyLKNH5Jk9MAyw?e=K3rxuU

Apologies for the messy spreadsheet, but I have two tables, one for SAT scores and one for school start times, and it sorted for each high school. For instance, Ballard in 2012-2013 has a SAT score of 590 (in cell B3) and a start time of 0.83 hours past 7:00 AM (in cell B16). I don't know if I can sort it out per high school, so I tried to just pair up each SAT score with its corresponding start time on the tall table to the right and I used regression with a y-input of SAT scores and x-inputs with both school start times and # years after 2010.

The spreadsheet above and this question on cross validated were part of a larger data set, but I excluded it to more easily ask the question. However, here is the link to the larger data set: https://epscloud-my.sharepoint.com/:x:/g/personal/198890_apps_everettsd_org/Efuy5OT13rJLnFUZpPPQ5ucBn4CzjmKsxmnGUh1Fii-FWw?e=UMeGEL

The table beginning in A1 is SAT ERW scores, A41 is SAT Math scores, and the three tables starting from A81 horizontally across to S81 is changing the start times to # of hours past 7:00 AM.

Further to the right, [Table 1] is a correlation between school start times (SSTs) and ERW scores. [Table 2] with SSTs and Math scores. [Table 3] was an attempt with #years after 2010, SSTs, and ERW. [Table 4] with #years after 2010, SSTs, and math. [Table 5] under Table 1 was I believe using Table 1 but doing a regression instead of correlation.

Kevin
  • 11
  • you could add month and year terms to your regression equation. this will give you the contribution of those factors so that you can make the inference you want on the coefficient for the start time. – Estimate the estimators Apr 10 '22 at 02:59
  • thanks for the help, I'll try to see if I can do that – Kevin Apr 10 '22 at 03:28
  • I attempted to perform a regression with "SAT (ERW)" as my y-range and "Years After 2010" and "SSTs" (school start times) as my x-range(s). The "significance F" of 0.219996 is over 0.05, so does it mean that the data is not statistically significant? Also the p-values for "Years After 2010" of 0.585265 and "SSTs" of 0.155638 are both over 0.05. Is there any statistical test other than regression that can be done with the data, or is there not another method due to the p-value not being under 0.05. Would this also indicate that schools start times don't play a role in influencing SAT scores? – Kevin Apr 10 '22 at 08:35
  • 1
    can you post your full work inthe original question? – Estimate the estimators Apr 10 '22 at 10:53
  • You have shown (part of) your data as a screenshot. Please do post them, in place, as text, so that people can copy and use it in their answers. – kjetil b halvorsen Apr 10 '22 at 11:20
  • Would a link to the Excel spreadsheet also work? – Kevin Apr 10 '22 at 17:53
  • I took a look. The model fits very poorly. I'd try adding another coefficient for the school. But it looks on the face of it like the morning time is not predictive of the score. – Estimate the estimators Apr 10 '22 at 18:11
  • Yes, that appears to be the issue. I'm referencing a study I found online which did a similar regression analysis between ACT scores and school start times in Minnesota. He also found that there does not seem to be an impact from school start times influencing ACT scores. However, he was able to prove there wasn't an impact with a regression. What I'm confused about is if my p-value is not less than 0.05, then how would I state there isn't a correlation in the first place, since wouldn't the high p-value invalidate the Pearson-correlation I found? – Kevin Apr 10 '22 at 18:19
  • His study also have more covariates, with various other factors like income and gender. If I were to add more covariates to mine too, wouldn't the high p-values under the ANOVA table for just the first spreadsheet with the 10 high schools for the existing two variables of school start times and #years after 2010 still remain high? – Kevin Apr 10 '22 at 18:23
  • This goes well beyond what a comment can offer.. but you want to look at the coefficient for the time after 7. Think about what unit it represents (something like: increase in score by 1 for every 1 minute after 7AM). Look at the confidence of the estimate for the coefficient. You can see it's not clear whether a minute after 7PM predicts an increase or decrease in score (holding all coefficients constant). The reason you need to add other predictors to your model is you want the coefficient for time to reflect its impact on the score and not any other endogenous factor missing from the model. – Estimate the estimators Apr 10 '22 at 18:30
  • Sorry, I had to take some time to think through this. To clarify, if I were given the SAT test scores and school start times above and after a regression the p-value was greater than 0.05 and I had to answer the research question: "To what extent does school start times influence student academic performance in the form of SAT scores?" Would I be able to say there is not apparent relationship between school start times and SAT scores? Or would further analysis be required? – Kevin Apr 10 '22 at 19:11
  • If this is a class analysis, I think you can say reject the null hypothesis. https://stats.stackexchange.com/questions/135564/null-hypothesis-for-linear-regression

    In reality, there are a lot of other assumptions to check. For example, a 0.05 cutoff is not always optimal. This is a good read on some techniques to learn: https://stats.stackexchange.com/questions/3200/is-adjusting-p-values-in-a-multiple-regression-for-multiple-comparisons-a-good-i

    But be sure to add an effect for each school in your model, at the very least.

    – Estimate the estimators Apr 10 '22 at 21:33

0 Answers0