Is it possible to introduce cluster probabilities into a regression?
Consider the Old Faithful Geyser data set. Most clustering algorithms find 2 clusters when analysing eruption times and waiting times. We can fit regressions to the data in each cluster to estimate waiting times as a function of eruption time. However, with probabilistic clustering (e.g. gaussian mixture models) one can obtain an estimate of the probability that an observation is in each respective cluster.
How can we incorporate these probabilties into the regressions? If the probabilistic clustering model was to predict an observation has a 40% probability of being in cluster A then I would like to use this information, rather than discard it and only consider the observation in the other cluster.
I have thought about using sampling based on probability weights and weighting observed values by their probabilities. However, I think the latter will just dampen the regression line towards 0. Any suggestions or advice would be welcome.