I would like to use a large dataset with tissue volume measures of 500 volunteers over a given ageframe to compare it to a small dataset with volunteers and patients (20-30 datapoints).
The tissue volumes correlate well with age and did a linear regression model for the large dataset and I thought I could also use the standard deviation to describe how the small data differs from the large dataset.
I explored the data and found that age is normally distributed around its mean and the Confidence Interval is slightly larger at the ends of the linear regression.
- How can I show this uncertainty in the standard deviation too?
- I am not sure how to test if it is feasible to assume the standard deviation is not changing with age (that's how it looks by eye).
My aim with this would be to (1) compare the volunteers from the small set to a large pool to volunteers and ideally show, that they are within a normal range and (2) to be able to compare individual patients with a larger dataset of "normal" and describe how far away they are from a "typical" volume measurement at their age. This could then maybe be used to subgroup the patients for further analysis.
volumefor a given age and its standard error so you can see if an individual is far outside the expected distribution. That includes error both in the regression estimate itself and from the remaining mean-squared error. Standard software can do that, e.g.predict.lm()in R. See this Penn State page for formulas. But you should be very cautious about making individual decisions that way. 5 out of 100 individuals will normally be outside 95% limits, and you might risk bias by removing them. – EdM Jan 12 '22 at 19:57