There is a sizeable body of literature on the issue of multiple maximizers in maximum likelihood estimation, such as
I am wondering if anyone is aware of any datasets (and choice of likelihood function) that exemplify this behavior? One possibility is to to a linear regression using a Cauchy loss function rather than square the residuals, but it feels contrived. Gaussian mixture models are another example, but I am unaware of any well-studied datasets in which the issue of multiple solutions comes into play.