In theory, tree based models like gradient boosted decision trees (XGBoost is one example of a GBDT model) can capture feature interactions by having first a split for one variable and then one for the other variable in some of the trees the model consists of (the leaf node values of which then get added together to make a prediction, so by having different combinations of splits in the two variables across trees, you can with enough trees approximate most functions).
However, if you suspect/believe particular transformations of features (e.g. interactions or some more complex functions of multiple features) to be important, it makes it much easier for the model to figure this out, if you provide these transformed features already as a new feature. The process of coming up with good new features is called "feature engineering" and can in some cases make a huge difference.
When will it make the biggest difference?
- When there's not so much data so that it is hard for the model to "figure out" the right splits to approximate the transformed feature (without overfitting given that it has a huge space of possible interactions of multiple features that it can try).
- When the transformation is complex and not well approximated by step-functions (especially if splits of several variables in the same tree are ideally needed to get a good approximation = strong high-dimensional interactions, which a step function in just one or two variables gets badly wrong).
- When we have strong theoretical (e.g. due to geometry or physics of the situation) or human intuition (e.g. for predicting whether someone is going to buy something when visiting a website, let's say we have previous sales for the customer and previous visits to the website by the customer as features, it is very intuitive for a human to form a feature like sales per visit) reasons to use a particular feature.