As a simplified example, assume I have a model with two linear terms and an interaction between them:
y ~ b0 + b1.x1 + b2.x2 + b3.x1*x2
x1 and x2 have very different ranges and variances, so I scale and centre them before running the model. When generating predictions from the model for one term, this has the benefit of the means for other terms being 0. So, for example, if examining the effect of x1 in isolation I use:
y ~ b0 + (b1 * x1_new_data) + (b2 * 0) + (b3 * 0)
Where x1_new_data is a vector of 500 values between the minimum and maximum of the original x1, then scaled and centred using the same values as for x1.
My question concerns the case when predicting the interaction term x1*x2: should I also provide new data for x1 and x2 rather than cancelling them out with 0? I can think of four options here:
y ~ b0 + (b1 * 0) + (b2 * 0) + (b3 * x1*x2_new_data)y ~ b0 + (b1 * x1_new_data) + (b2 * x2_new_data) + (b3)y ~ b0 + (b1 * x1_new_data) + (b2 * x2_new_data) + (b3 * x1*x2_new_data)y ~ b0 + (b1 * x1_new_data) + (b2 * x2_new_data) + (b3 * (x1_new_data * x2_new_data))
For option 1, I am only predicting the interaction effect when the constituent linear terms are held at their mean values, which doesn't seem logical. I think option 2 is probably junk. Option 3 is complicated by the generation of the new data as shown in @Dave's answer. Option 4 is probably the most straightforward, provided it is correct to include the linear terms as well as the interaction when generating predictions.
I'm prompted to ask this because I have a case where y has a positive linear relationship with both x1 and x2, but the x1*x2 interaction has a strong negative curve over some range of the values. This gives seemingly unrealistic predictions, so that y has a positive response to x1 + x2, both in isolation and together, but a negative response to x1*x2.