Model selection for $4\times 2\times 2$ factorial design in R

Question

I have read so many books, forums, and sites that I have completely confused myself.

Originally, my design was a simple 4x2 factorial with two categorical predictors, but it has gotten more complex with adding another categorical predictor, two continuous covariates, and an additional continuous dependent variable. Does anyone have suggestions for selecting a statistical analysis for a 4x2x2 with 2 continuous covariates and 2 continuous dependent variables? I have since looked in to MANOVA and the mixed model, but I can't solidify a reasoning for choosing a specific approach.

Covariates are growth data on the individuals, and the dependent variables are not repeated measures.

Please edit the question to give more information about the details of your study. You seem to have two continuous outcome variables and two categorical predictors (one 4-level and one 2-level/binary, for the original 4x2 factorial design), but what type is the "another independent variable" and the "two covariates"? Are there repeated measurements on the same individuals? This seems like it could fit under the general form of a linear model, but it's hard to say without such details. Please provide that information by editing the question, as comments are easy to overlook and can be deleted. — EdM, Jun 04 '22 at 02:06
@EdM Thank you for the feedback. I hope I have provided some clarification. — E10, Jun 04 '22 at 02:25

score 0 · Answer 1 · answered Jun 04 '22 at 03:18

0

What you describe seems to fit a two-outcome (multivariate) multiple regression model. Each of your categorical and continuous predictors/covariates is entered into the model, something like this in R:

lm(cbind(outcome1,outcome2) ~ factor1*factor2 + extraCategorical + covariate1 + covariate2)

That includes the initial factorial design and additive terms for the other categorical predictor and the covariates. If you think some interactions among the other predictors are important and you have enough data, you would include those also.

These notes by Fox and Weisberg show how to perform such a multivariate analysis. Associations between the two outcomes for each individual are handled by the covariance structure; the point estimates of regression coefficients for each outcome are the same as you would get with separate regressions. I suppose this technically would be called MANCOVA (multivariate analysis of covariance), but such terminology often gets in the way more than it helps. Focus on what the outcomes are, what the predictors are, and how you think they might be related. Then specify a linear model to match. That way you can extend the rigid structures of those classic acronyms to give you just what you want.

answered Jun 04 '22 at 03:18

EdM

92,183
10
92
267

Thank you! I think the will work, especially since I'm interested in the interactions of the predictors. I've tried to figure out the technical classifications to make sure I meet all the assumptions before getting too far along. – E10 Jun 04 '22 at 13:52
@Dakota when you put all those classical classes of models into the single schema of linear models, then the critical assumptions end up being the same for all of them. See this page for a good introduction. – EdM Jun 05 '22 at 17:06
I'm trying one more thing with the initial 4 treatment levels of one categorical predictor with 1 continuous response variable over 6 sampling dates. Initially I thought I would used a repeated measures anova, but am I overlooking a reason to use a manova or mixed effects glm instead? – E10 Jun 07 '22 at 01:21
@Dakota if you must model a response variable in terms of counts, then the simple lm() approach to correlations of the bivariate outcomes within individuals might not work. If the counts are high enough you might have a work-around by treating the counts as effectively continuous, perhaps with a square-root or log transformation, and proceeding with lm(). If those solutions aren't appropriate for your data, I'd suggest searching the site or asking a new question about how to handle multivariate outcomes of different types (continuous, count, binary, ...). That's outside my expertise. – EdM Jun 07 '22 at 14:57
@Dakota percentages are often better handled by modeling the actual outcome and using things like the number of trials or observation duration as a predictor. For binary outcomes, a logistic regression with glm() handles that directly when you specify the outcome as a 2-column matrix of successes and failures for each case. For count or continuous outcomes, an offset term could be appropriate. When you ask another question, be sure to provide information about the experimental design and the original measurements that were made, rather than just jumping to their reduction into percentages. – EdM Jun 07 '22 at 16:14
I've worked through my analysis, and determined that I did not need both covariates, since they were correlated, and my response variables are not correlated, so I am running them through the model separately. I've used an aov approach for an "ANCOVA" aovout<aov(response~ covariate.centered + factor1 * factor2 * factor3, data), and the aov() and lm() function gives me the same output even though the SS should be different (Type 1 vs Type 3). I've also read that glm() should be used instead of a 3-way ANCOVA with aov(). Do you have any insight? – E10 Jun 29 '22 at 05:16
@E10 the glm() function is for generalized linear models that allow nonlinear mapping between the linear predictor and the outcome and other error distributions around predictions--like for binary outcomes. The aov() and lm() functions are identical in terms of modeling; they just present results in different ways. The Anova() function in the car package applied to an lm() model might be a better choice, particularly if your design is unbalanced. Read its help page. – EdM Jun 29 '22 at 13:15
@E10 for the dual responses is not so much whether the responses themselves are correlated, it's whether the error terms are correlated. If you'd like to pursue a bivariate outcome model, the Anova() function in the car package allows for that, as explained by Fox and Weisberg in the reference I linked in the answer. – EdM Jun 29 '22 at 13:21

Model selection for $4\times 2\times 2$ factorial design in R

1 Answers1