How to correct for order bias in responses to psychometrics questions?

Question

I'm looking at the data from Individual Aesthetic Preferences for Faces Are Shaped Mostly by Environments, Not Genes by Germine et. al., where members of twin pairs were presented faces in an order and asked to rate them. The data has rows corresponding to raters and the columns to faces. The subset that I focus on below is available here.

I'm attempting to do dimensionality on the data to determine the dimensions on which the faces vary as far as ratings are concerned. After normalizing for average attractiveness rating given to each face, I find that most of the covariance is coming from correlations between ratings of faces that were presented near by one another, as in the matrix below, where there are positive correlations near the diagonal and negative correlations if one moves some ways from the diagonal:

It seems that each rater was anchoring on an implicit moving average. When I do principal component analysis, the first principal component seems to be picking up primarily on this effect. I would like to correct for the effect.

Autoregressive integrated moving average time series forecasting seems relevant here, but I'm unclear about how to implement it appropriately in this context. I tried what Zach suggested in response to the question Estimating same model over multiple time series, namely to convert the data to a single time series and use auto.arima from the "forecast" library:

library(readr)
library(forecast)
library(corrplot)
library(Matrix)
mdf = read_csv("~/Dropbox/mdf.csv")
s = unlist(lapply(1:nrow(mdf), function(i){as.numeric(mdf[i,]}
len = ncol(mdf)
fit = auto.arima(s[1:(30*len)])
a =  Arima(s, model = fit)
mat = Matrix(a$residuals, nrow = nrow(mdf), ncol =ncol(mdf))
corrplot(cor(as.matrix(mat)))

but I found that doing this purged the data of all correlations, whereas I just want to strip out the correlations due to order effects.

Any suggestions would be much appreciated!

score 1 · Answer 1 · 2016-12-17T22:58:54.603

If you want to correct for the effect that's mostly picked up by the first principal component, then regress the original data on the first PC (the first PC is a single variable that consists of n=50 "principal component scores" associated with the major eigenvalue). If you want to recreate a new artificial dataset without the effect then you have to run multivariate linear regression for which the multiple y-variables are based on any original variables you want to remove the effect from in your dataset, and the single x-variable to regress the y-variables on is the first PC. Next, get the residuals (y-values minus the predicted y-values called "y-hats") for each of the y-variables, and they will not have the effect. Call the residuals your new dataset without the effect.

FYI-your identical twins need to be older, because twins live in the same household till grown up -- their environments only change after they have grown up, moved on in life and go their own way (i.e., different environments). Thus, if your twins are all young (<20), then they have not lived on their own for a considerable period of time in different environments. It looks like they both vote very close throughout all the n=50 pairs, since the data load mostly on the diagonal of the matrix. It would be interesting to see if there are any patterns on the discordant votes (orange in the off-diagonal matrix elements?). That is, was there anything different about the images for which votes agreed and disagreed? Probably look at hair color, eye color, race, skin color/tone, facial patterns (eyebrows, lips, nose, etc.).

There will never be a genetic difference between identical twins (obviously) since they share 100 of their genes (100% of the time).

Thanks for your response! What you suggest would give a good first approximation, but I think that the first principal component doesn't entirely capture the effect, and intuitively it feels like it should be possible to correct for it in entirety. — Jonah Sinick, Dec 17 '16 at 22:07
Then regress the variables you want to remove the effect from on the PC's whose eigenvalue's > 1 (which is a rule of thumb). — , Dec 17 '16 at 22:14
My intent with this analysis is precisely to understand the PCs with eigenvalues > 1 to the extent that they're just just picking up an order effect, so replacing the variables with their residuals upon regressing against such PCs would purge the data of the information that I'm interested in. — Jonah Sinick, Dec 17 '16 at 22:26
Sounds good! - Remember, agreements will load on higher PCs (with greater eigenvalues) while differences will load on the components with lower eigenvalues, since all of your information (less noise) will be in all of the PC's. — , Dec 17 '16 at 22:35

How to correct for order bias in responses to psychometrics questions?

1 Answers1