3

I conducted an experiments where each participants played 5 rounds. For each round and for each participant I collected information regarding the dependent variable $y$ and the independent variables $X$.

Using linear regression to understand how $X$ drives $y$ I was wondering if any of the two below approaches are considered to be better from a methodological or statistical stand points:

  1. Mean approach: for each participant calculate the mean of $y$ and $X$ and run a regression on the reduced dataset (reduced by factor 5, i.e. the rounds).

  2. Rounds approach: use the raw data and include as control variable the round number.

Given that approach 2 has more observations and I don't average out anything I prefer to do approach 2. However, the observations are not independent using this approach so I was wondering if I can do this?

I would be grateful for some insight on

a. if the approaches both are correct / can be applied or if there are any flaws.

b. what other "typical" methods there are to analyse this kind of data (anova, panel regression?)

seanv507
  • 6,743

1 Answers1

7

Hidden option C is best: use a to model all your data at the same time while accounting for the non-independence using a random effect for participant.

Here's are some threads explaining how these work:

What is the difference between fixed effect, random effect and mixed effect models?

What is a difference between random effects-, fixed effects- and marginal model?

Concepts behind fixed/random effects models

Are mixed models useful as predictive models?

mkt
  • 18,245
  • 11
  • 73
  • 172
  • 1
    Great, thanks for the swift support and the useful links! – FredMaster Jun 22 '23 at 14:59
  • Worth noting that this is similar to including participant ID (not round number) as a control variable in OP's approach #2. – Eoin Jun 22 '23 at 15:03
  • Yes, you are right. I should have included "person" not "round". But is this than actually similar to the above mixed models? Or is this something completely different (less good method)? – FredMaster Jun 22 '23 at 15:05
  • 1
    @FredMaster Happy to help. To your last comment, mixed models take advantage of 'partial pooling' and generally perform better than simply including person as a fixed effect (the normal kind) in a standard regression. See the linked threads to understand partial pooling better. Also this one, which is less detailed but perhaps a bit easier to follow: https://stats.stackexchange.com/questions/250277/are-mixed-models-useful-as-predictive-models – mkt Jun 22 '23 at 15:07
  • @FredMaster in case you're completely new to the world of mixed-effects modelling, I recommend McElreath's gentle introduction. – Durden Jun 23 '23 at 17:13