Consider a general problem where we try to model an output variable $Y$ with several independent variables $X_1$, $X_2$, $X_3$, etc. that are binary or continuous. From previous study, we know that the values of the continuous variable $X_1$ are affected by a binary variable $Z$ but the $Z$ has no effect on the output. How should I model this in R?
Y ~ X1:Z + X2 + X3Y ~ X1:Z + X1 + X2 + X3Y ~ X1:Z + Z + X1 + X2 + X3
Here is my concrete example as it might help : $X_1$, $X_2$, $X_3$ are features extracted from medical imaging data such as for each patient the mean or the maximum of the values in a region of interest. $Y$ could be either a binary output describing if the tumor is aggressive or not, or survival data such as overall survival. $Z$ is a binary variable that describes if the patient has got a premedication before the image acquisition. We know from a previous study that if the premedication is given to a patient we will observe higher $X_1$ values than in the absence of premedication.
My instinct tells me to use option 1. because $Z$ has no impact on $Y$. It only depends on if the premedication was given or not which depends on the date of the image acquisition (protocol changed other time) so we can assume in my opinion that this is random. But from what I read on an older post it doesn't sound like a good idea to omit the main effect term.