Within-subject model training based on multiple subjects

Question

I have the following situation: - ~300 participants, for each of them I have ~30 participant-specific data (from questionnaire) - For each participant I have ~200 points of data consisted of 1 independent variable (reaction time) and 1 dependant/predicted variable (attention).

The goal is, when given a new participant with his ~200 points of data - we would make a prediction for each of his points.

So I clearly want a case-by case prediction, but take specific participant traits into account.

I'm used to working with data "case-wise", where each case has 1 dependent(predicted) variable and a lot of indepndent (predictors). But here I need to predict variation within participant, but also take participant into account.

I have tried normalizing all response times within participants, and then train model on each case, ignoring variation between participants. No good.
Another option I considered: just adding all participant-related data to every point within participant, making data overabundant, but flat. Didn't try it yet, but seems more reasonable.

But is there a good or conventional way to treat this kind of data? I believe for some people it is a common case. I mean to somehow explain to the model, that "these 200 data points are for one participants", not just "these 200 data points have the same value for 30 traits"?

Tool-wise: I plan to try Random Forest and Neural Network to see which does better. Doing it all in R.

score 0 · Accepted Answer · answered Oct 30 '19 at 20:31

0

If I understand your second option correctly, you'd wind up with 300 * 200 datapoints. This probably didn't work because of the fact that the errors for a given participant's measurements will almost surely be correlated. I think you have realized this already, since you mention that you want to "predict variation within participant".

This sounds exactly like a mixed model (aka mixed effects model or random effects model). This resource is longer than the wiki page, but I find it a little less obtuse. Although the programming example is in SPSS, you should find lots of mixed model packages in R.

I'm not sure how popular neural or tree-based mixed models are, though - according to this post, they at least exist.

An alternative: you could predict the entire 200-length vector of attention variables for each participant, given the participant's 30 questionnaire answers and 200 reaction times. This gives you only 300 datapoints, but at least they're all independent of one another.

answered Oct 30 '19 at 20:31

goopy

98

Thank you! I will look through mixed models - they look relevant.
Could you elaborate more on "predicting vector"? I never predicted anything else but a single value, but the idea makes sense. Can you do this with the same methods (say regressions, random forest or neural network)? Or do I need to look up for specifically adjusted models for vector predictions? The main downside that I imagine, is that vector puts much value in order, and might consider, for example, participants "4th trial" to be more relevant to other participant's "4th trial", instead of "any trial". Is that so?
– Igor Sokolov Oct 31 '19 at 03:28
When your model predicts the vector of 200 trials, it's doing multivariate regression. Multivariate linear regression reduces to just 200 different linear models. But RFs and NNs are easily extended to multivariate predictions.
The tricky part is your loss function. Suppose your "irreducible error" is very high for everybody's 100th trial, and very low for everybody's 1st trial. If you optimize MSE, then a neural net might pay too much attention to 100th trials, since that's what's contributing the most to the overall loss.
– goopy Oct 31 '19 at 18:04
One more comment on the loss function: that "tricky part" should only be a problem with a really complex model (eg enormous neural net) - the idea is that your net might overfit to its predictions for the 100th trial before getting any good at predicting the 1st trial. But you'll probably be fine. – goopy Oct 31 '19 at 18:10

Within-subject model training based on multiple subjects

1 Answers1