How does a machine learning model estimate unevenly correlated features?

Question

Aside from additional data and feature engineering, how can I make sure my ML model picks up on patterns that only apply to certain thresholds or ranges? For example, assume I use data about users and movie ratings. I want to predict parameters for user ratings and watch time to predict total revenue from the movie. The rating of a user who watched for twenty minutes might deserve a 0.1 weight relative to other watchers, a user who watched forty-five minutes 0.5, and a user who watched two hours 1. In other words, the ability of each user's rating to predict movie profits varies unevenly depending on user watch time.

All I can think of is to engineer a new feature like user rating weighted by watch time. The alternative is to gather more data and hope that the relationship happens to emerge in my training set.

The model can only learn what is observed in the data, if this pattern does not show in the data or is weak, the model is probably not going to learn it. If you have domain knowledge and "know" that such a pattern exists, you have to help the model and put this in somehow. — user2974951, Jul 20 '23 at 05:49

score 0 · Accepted Answer · answered Jul 20 '23 at 07:28

There is a number of models you could use for such a problem.

You could simply use linear regression with interaction terms, where different features would interact with watching time. This is a model that can detect a relation that differs for specific subgroups. Notice that with such a model, you don't need to discretize the data into intervals, you could use the watching time as-is.
Stagewise regression would lead you to build a regression model where you would have distinct regression lines conditionally on some features. This would be useful if you have some prior knowledge about the data and want to define a model that explicitly states how to split the data for separate regression lines.
If you don't know how the data is clustered, but you know that there are several clusters where each has its own regression line, you could use latent-class regression, a model that combines cluster analysis and regression.
A regression tree would split the data into many groups and predict means per each group. If you want to take into account multiple interactions between the features and use a non-parametric model, this may be one possibility.
Finally, you could use something like a neural network that will by itself figure out the interactions in your data.

When you say a regression tree splits data into many groups, do you mean the tree will take different subsets of input, not just different subsets of features? — Joachim Rives, Jul 21 '23 at 08:44
@JoachimRives of features, as it branches over the features. — Tim, Jul 21 '23 at 08:51

How does a machine learning model estimate unevenly correlated features?

1 Answers1