For a linear regression model relating y (continuous variable) to sexe and age, you would actually need to use the lm() function like so:
model1 <- lm(y ~ sexe + age, data = data)
summary(model1)
The above model assumes that the effect of age on y is the same for both values of sexe. To fit a model which allows for the effect of age to be different across the two values of sexe, you can use this syntax:
model2 <- lm(y ~ sexe*age, data = data)
summary(model2)
To determine which of the two models is supported by your data, you can perform an ANOVA F-test:
anova(model1, model2)
If the p-value for this test is smaller than your pre-selected significance level alpha (e.g., alpha = 0.05), then the data provide evidence that the effect of age differs across values of sexe.
The glm() function is better suited for models where the outcome variable y may be a count variable, or a binary variable with values 0 and 1, or a categorical variable (nominal or ordinal), etc.
Addendum:
When you fit each of the two models described above, model1 and model2, it's not a bad idea for you to check the variance inflation factor (vif) for each term in the model.
install.packages("car")
library(car)
vif(model1)
vif(model2)
When you do so, here is what you get for model1:
> vif(model1)
sexe age
1.024189 1.024189
and for model2:
> vif(model2)
sexe age sexe:age
4.611570 2.799449 7.138292
Warning message:
In summary.lm(object) : essentially perfect fit: summary may be unreliable
Notice the warning posted by R, which suggests that the summary reported for model2 may be unreliable, and also the large vif for the interaction term sexe:age in model2. You might have to discard model2 and stick with model1 for these data, even though the p-value corresponding to the ANOVA F-test is statistically significant.
Ignoring the issues with model2 for now, here's a quick way to get the plot produced by Sal in his answer:
install.packages("sjPlot")
install.packages("sjmisc")
library(sjPlot)
library(sjmisc)
sjp.int(model2, type = "eff")
You can also get better formatted output for your models using these commands:
sjt.lm(model1)
sjt.lm(model2)
sjt.lm(model1, model2,
depvar.labels = c("y", "y"))
y ~ sexe*age*var1*var2or does it make sense to testy ~ sexe*age*(var1+var2)? Also, do you know if we can use other algorithms, randomForest, GBM? – John Smith Jun 03 '18 at 08:52