Linear regression when observations may not be independent

Question

I conducted an experiments where each participants played 5 rounds. For each round and for each participant I collected information regarding the dependent variable $y$ and the independent variables $X$.

Using linear regression to understand how $X$ drives $y$ I was wondering if any of the two below approaches are considered to be better from a methodological or statistical stand points:

Mean approach: for each participant calculate the mean of $y$ and $X$ and run a regression on the reduced dataset (reduced by factor 5, i.e. the rounds).
Rounds approach: use the raw data and include as control variable the round number.

Given that approach 2 has more observations and I don't average out anything I prefer to do approach 2. However, the observations are not independent using this approach so I was wondering if I can do this?

I would be grateful for some insight on

a. if the approaches both are correct / can be applied or if there are any flaws.

b. what other "typical" methods there are to analyse this kind of data (anova, panel regression?)

mkt · Accepted Answer · 2023-06-23T14:01:34.347

7

Hidden option C is best: use a mixed-model to model all your data at the same time while accounting for the non-independence using a random effect for participant.

Here's are some threads explaining how these work:

What is the difference between fixed effect, random effect and mixed effect models?

What is a difference between random effects-, fixed effects- and marginal model?

Concepts behind fixed/random effects models

Are mixed models useful as predictive models?

edited Jun 23 '23 at 14:01

answered Jun 22 '23 at 14:44

mkt

18,245
11
73
172

1

Great, thanks for the swift support and the useful links! – FredMaster Jun 22 '23 at 14:59
Worth noting that this is similar to including participant ID (not round number) as a control variable in OP's approach #2. – Eoin Jun 22 '23 at 15:03
Yes, you are right. I should have included "person" not "round". But is this than actually similar to the above mixed models? Or is this something completely different (less good method)? – FredMaster Jun 22 '23 at 15:05
1

@FredMaster Happy to help. To your last comment, mixed models take advantage of 'partial pooling' and generally perform better than simply including person as a fixed effect (the normal kind) in a standard regression. See the linked threads to understand partial pooling better. Also this one, which is less detailed but perhaps a bit easier to follow: https://stats.stackexchange.com/questions/250277/are-mixed-models-useful-as-predictive-models – mkt Jun 22 '23 at 15:07
@FredMaster in case you're completely new to the world of mixed-effects modelling, I recommend McElreath's gentle introduction. – Durden Jun 23 '23 at 17:13

Linear regression when observations may not be independent

1 Answers1