We ran a complicated experiment and are struggling to build a linear model that estimates everything we're interested in. We showed each person a description of a product (for illustration, let's say a car) and varied three things:
- Whether the description was displayed in colorful, playful lettering or as black, normal text.
- Which types of information were included in the description (e.g., cheap price, high safety rating)
- Whether 0, 5, or 10 types of information were included in the description
and we want to estimate:
- the effect of each type of information (e.g., cheap price vs nothing)
- the interaction effect between each type of information and colorful vs black lettering
- the overall effect between colorful vs black lettering (among people who saw any description)
- the interaction effect between each type of information and whether 0, 5, or 10 types of information were included
- the overall effect between 5 vs 10 types of information (among people who saw any description)
- the overall effect of seeing any description at all vs not seeing one
The issue is if we include all those variables, we very quickly get redundancies/multicollinearity. For example, if you know someone saw 9 types of information, you know they also saw the 10th. If you know someone had an interaction between colorful lettering and cheap price, then you know they saw colorful lettering.
Does that make sense? How does one build a model that allows for estimation of all these effects? It seems like we will have to exclude most main effects and just include interactions, then combine mean and variance estimates for huge numbers of cases (e.g., all combinations of 5 types of information). Or maybe we can fit different models that each include as many predictors as they can, where each estimate we want is in at least one of the models. For example, we could test for the effect of colorful lettering by just ignoring the interaction effects and building a model with the other variables + colorful/black.