That is simply untrue, its more a rule of thumb than anything. For example in R.
> x1 <- rnorm(100)
> x2 <- rnorm(100)
> y <- x1 * x2 + rnorm(100)
>
>
> summary(lm(y ~ x1 + x2))
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-5.4396 -0.9537 0.0591 1.0530 3.1617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.06657 0.13804 0.482 0.6307
x1 0.23257 0.13433 1.731 0.0866 .
x2 -0.21193 0.13720 -1.545 0.1257
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.37 on 97 degrees of freedom
Multiple R-squared: 0.05351, Adjusted R-squared: 0.034
F-statistic: 2.742 on 2 and 97 DF, p-value: 0.06943
> summary(lm(y ~ x1 + x2 + x1 * x2))
Call:
lm(formula = y ~ x1 + x2 + x1 * x2)
Residuals:
Min 1Q Median 3Q Max
-2.79339 -0.72311 0.01982 0.78272 2.39362
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.05560 0.10767 0.516 0.607
x1 0.05168 0.10720 0.482 0.631
x2 0.04472 0.11176 0.400 0.690
x1:x2 0.98052 0.12309 7.966 3.35e-12 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.069 on 96 degrees of freedom
Multiple R-squared: 0.4302, Adjusted R-squared: 0.4124
F-statistic: 24.16 on 3 and 96 DF, p-value: 9.881e-12