For a response variable $y$ and predictor $x_0$, I have data for a number of additional variables $x_n$, $n = 1, ..., 7$. I would like to control for a confounder in my GLM, let's call it "size". $x_1$, $x_2$, $x_3$ are all variables that measure "size" in a certain way, e.g. number of incidents, local volume of incidents, global volume of incidents. How do I treat these three variables, do I include them all, or just one of them? Should I include interaction terms, if so, just $x_1*x_2*x_3$, or also $x_1*x_2 + x_1*x_3 + x_2*x_3$? I am conscious not to construct an overly complicated model. Should I first check for relationships between each variable, in isolation? I.e. does $x_2$ actually increase with $x_3$.
I also have additional confounders, e.g. age, that are expected to vary with size. Does this necessitate further interaction terms between those two confounders? I am not sure if ending up with a 20-term GLM is a good plan... maybe I have misunderstood something. Thanks.
cor(x1, x2)if you useR. – mkt Sep 23 '22 at 10:26