1

I'm trying to analyze some sleep data from kaggle (this example data does not have correct temperature data but the actual data I will use in the future will have precise temperature) to try to find the temperature value that gives the best sleep quality.

At first I thought of trying some linear regression between sleep quality and some of the other variables and check the coefficient for the temperature. However, I don't think this is the right way to go because the relation between the two variables is not linear (i.e. if we were to start at 0ºC the quality improves as the temperature gets higher but it does not keep improving as the temperature increases higher than 40ºC, for example). A solution I find to this is to categorize the temperature in intervals and continue with the regression this way.

What I'm wondering is if this is the right way to do this or if there is a better way to do calculations like this where the relation between the two variables is not linear. I was wondering if there are other methods I should look into.

In summary, I would like to find in a way the temperature value that gives the best sleep quality like "solving an equation" in a way or with some other method. I would really appreciate any links or references about the topic as well.

pato
  • 11

2 Answers2

0

"Binning" a continuous variable like temperature is not a good idea. It fits a step function: fitted sleep quality would stay completely constant over a certain range of temperatures, only to jump suddenly as the temperature increases or decreases into the next bin. This makes no sense. See What is the benefit of breaking up a continuous predictor variable?.

Better practice is to use to model nonlinearities. Frank Harrell's Regression Modeling Strategies has a nice and easily digested introduction to spline modeling.

Stephan Kolassa
  • 123,354
  • Thanks for the answer. I'll check the textbook you mentioned for using splines in the regression. However, do you have any idea how to get the value that gives the best sleep quality? I'm thinking of some kind of inference maybe, but don't know how to continue from here. – pato Nov 22 '23 at 07:43
  • The simplest approach would be to fit a spline model (using something like cross-validation for setting the number of spline knots, and using heuristics as in RMS for the knot locations), and then numerically optimizing on the resulting fit. If you have a decent unimodal curve, that is easy. (If not, that may tell you something about your data.) – Stephan Kolassa Nov 22 '23 at 08:05
  • In addition, I would automate and bootstrap the entire process (knot number determination, location determination, fitting, optimization), to get a feeling for the variability of the optimum temperature. The variability of an optimum can be at least as important as the optimum itself. If that dataset had the temperature, I would be tempted to include this analysis here, it sounds like a fun little project. – Stephan Kolassa Nov 22 '23 at 08:07
0

The research question should be refined. Exploring the statistical dependence between temperature and sleep quality is a valid analysis, but "the temperature value that gives the best sleep quality" implies a kind of one-to-one mapping which I can hardly believe exists. The word "gives" may further give the impression that body temperature value determines sleep quality, another argument which is shaky at best, given the complexity of human physiological regulation.

If your sleep quality measurements are on an interval scale, you can find the mean sleep quality given body temperature, using binning. You can then fit splines or another continuous function to these bins, and you can then find the temperature where that curve peaks. That would be the temperature associated with the highest mean sleep quality in your dataset. In another dataset, where distributions of gender, age, physical activity, room temperature, stress levels etc. are different that yours, the temperature at which that mean peaks would probably differ.

KishKash
  • 359
  • You are right I probably didn't express my idea in the best way possible. I'm talking about room temperature and its effect on sleep quality. What I want is to measure both (and some other factors as well) over time for the same person and using this data I want to find the temperature with which this person gets the best sleep quality so I can set the room temterature to that value. The sleep quality is measured in a scale of 1-100. Is the spline method you mentioned still the way to go in this case? – pato Nov 22 '23 at 08:44
  • @pato Yes, it is. Do you have the option to set room temperature, and then use the sleep quality data collected during nights where the person slept in a temperature-controlled environment? If so, there will be some other modeling options to explore once this initial phase is underway. – KishKash Nov 22 '23 at 09:03
  • Yes, that is the idea of the project. I will have data from different people in their own rooms where the temperature can be controlled and want to set each room's temperature according to their sleep data. Could you elaborate on those other modeling options? – pato Nov 22 '23 at 09:13
  • With measurements over time you will have the option of giving more weight to more recent data. This will allow you to adjust to (slow) changes in ambient temperature preferences of the same person over time. – KishKash Nov 22 '23 at 10:46