I’ve got a clinical time series dataset containing (hourly) heart frequencies from deceased and survived patients. I need to investigate whether there is a significant difference of means between those two groups (deceased vs. survived). By assessing the ACF the time series clearly turns out to be auto correlated. Also, normal distribution assumption needs to be rejected. So, plain vanilla t-testing is not possible due to the violation of the main conditions.
According to some articles about HAC-robust testing (Newey-West (1987) t-stats and How to efficiently do rank-sum tests on autocorrelated time-series? ) I came along with following idea:
- Estimate a linear regression model globally where the variable ‘heart frequency’ depends on the two outcome groups (deceased & survived).
# Fit lm for heart frequency (HF) globally for Outcome groups (survived & deceased)
HF_lm <- lm(data = HF_data, HF~Outcome)
- Adjusting the model by using HAC-robust standard errors (Newey-West / Andrews)
library(sandwich)
library(lmtest)
Testing the coefficients with coeftest using HAC standard errors
coeftest(HF_lm, vcov. = vcovHAC(HF_lm))
- Assessing the slope coefficient for significance
> coeftest(HF_lm, vcov. = vcovHAC(HF_lm))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 86.8819 2.4534 35.4132 <2e-16 ***
Outcomesurvived -3.8165 2.8435 -1.3422 0.1796
Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From my perspective, the Outcomesurvived -3.8165 represents the difference of means of both time series. For comparison:
> as.data.frame(HF_data %>% group_by(Outcome) %>% summarise(Group_mean = round(mean(HF),4)))
Outcome Group_mean
1 deceased 86.8819
2 survived 83.0654
The p-Value (in this case 0.1796) then tells me about the significance of the result. (compare also R: Anova and Linear Regression)
My question is whether this is a fair and valid approach for comparing time series means or am I missing something? Would you suggest a different approach?