I want to use a multiple logistic regression to model the relationship between two experimental groups (test and control) and accuracy of a procedure, controlling for the experience (in years) of the participants.
outcome ~ group + experience
The design I am using is paired in the sense that every participant is tested twice, so there are no differences in baseline characteristics between groups (since they are the same individuals). If I was only testing for differences in time, a paired t-test would suffice, but I need to control for experience, hence a regression model is being built.
Time is measured in seconds until the procedure is completed, and accuracy is defined as completing it within a pre-specified threshold (the outcome is 1 if less than or equal to 6 minutes and 0 otherwise). It is expected that time and experience are negatively correlated or, experience practitioners are expected to take less time to complete the procedure.
I would like to test for interactions in this model, but I don't think it makes much sense to interact the group with experience.
outcome ~ group * experience
I am considering including time in the model and test for interaction with experience.
outcome ~ group + experience*time
Since time is used in the definition of the response of the logistic model I expect it to be significant even with a small sample size. However it seems to me that including time in this model would be circular reasoning.
outcome ~ group*time + experience
Q1: Is this a correct interpretation?
Q2: If I try interactions between time and the group instead, would that tell me that time is modifying the effect attributed to the group?
Q3: Does it make sense to test for interactions between experience and group in this setting?
EDIT: I understand Douglas Altman's point of that, while unnecessary dichotomization of a continuous variable is prevalent in medical research, it leads to loss of estimate precision (at the very least). I was able to make the case for a linear model of time ~ group + experience for this experiment as a secondary endpoint, but the main goal needs to remain being accuracy, which is why the outcome is a dichotomization of time. This practice is prevalent for a reason :)
timeandoutcomein a little more detail? Also, the question in your title is perhaps too much of an oversimplification. It can be very useful to include predictors that are known to be related to the response variable in some situations, and less so in others, depending on the causal pathways. – mkt May 11 '23 at 11:48outcomeis defined completely bytime, it doesn't make sense to use time as a predictor for outcome (or vice versa, for that matter). – mkt May 11 '23 at 11:58summary()depend on predictor coding. For evaluating a predictor, use a measure that includes all terms involving it. TheAnova()function in the Rcarpackage does that in a way that (unlike the basicanova()function) doesn't depend on the order of entry of variables in the model. Use post-modeling tools like those in theemmeanspackage to illustrate specific scenarios properly. – EdM May 11 '23 at 14:58