3

We have developed a model for some real data and we use EM algorithm for optimization of the model (parameters). In first phase we generate synthetic data according to the model (with some known parameters) to validate that our inference code is bug free and the results are (perhaps) strange. We divide the synthetic data into train and test parts. Actually in first iterations of EM algorithm, (both) negative-log-likelihood of test data (EM objective) and parameters error (Mean Square Error, etc) decrease successfully. But in the later iterations of EM, We see that while the negative of log-likelihood decreases on test data slowly the parameters error starts to diverge. I draw a schematic figure of the situation below: enter image description here

I should underline that the negative-log-likelihood of test never increases but its decreasing just slows down dramatically.

I am wondering why this situation happens? Can we consider it as over-fitting ? (and if yes, why?)

  • If you use a model to generate data, how can the resulting data possibly be used to validate the model that was used to generate it? Am I misunderstanding? Is the correct interpretation that you are testing your code by creating data from a model with known parameters, and making sure your code recovers those parameters? – Matthew Drury Feb 11 '16 at 20:16
  • 1
    We do this to make sure than our inference code is bug free. it is a typical action in machine learning papers. – CoderInNetwork Feb 11 '16 at 20:56
  • 2
    Yah, I figured that must be what you meant, and I agree that it is good practice for sure. That said, it was a little unclear to me from your question that this is what you were doing, it's worth editing for clarity I think. – Matthew Drury Feb 11 '16 at 21:00
  • Is the question missing some details? I.e. in what sense could there be overfitting? Are additional predictor variables being added? – Björn Feb 16 '16 at 08:06
  • @Björn because in later iterations of EM MSE error of model increases. – CoderInNetwork Feb 16 '16 at 14:11
  • It could be that you have a problem where you get unidentifiability of parameters, which you may call overfitting. Why don't you just set a few of your parameters fixed and check if the problem persists? – Florian Hartig Feb 17 '16 at 08:50

3 Answers3

1

EM is actually quite stable and should work well on your problem if you initialize the algorithm well.

Short answer is actually: I assume your model trains by minimizing negative log likelihood. In this scenario, it is not unheard of that the same measure (negative log likelihood) converges further while others do not. Yes, this could be overfitting but most probably not overfitting w.r.t. data but w.r.t. the underlying optimization problem.

Another possibility is that there is simply something wrong in your pipeline. What actually is your generative process, does the mean squared error actually make sense here? How is your data distributed -- given that it is a generative model providing you with data, this should be an easy one to answer.

pAt84
  • 551
  • 3
  • 9
1

I have a few remarks:

  • Overfitting means that the error on your test-set starts to increase after a few training iterations while the error on your training-set still decreases. That should be easily testable. However, overfitting is not directly related to parameters of your model. In case its overfitting the error of your model on your test-set looks like your right plot.
  • Are you sure that the parameters of your model diverge from your ground truth parameters, or are you seeing a different effect. I.e. the model EM finds might have a different order of its parameters, which makes comparing them more difficult. In other words, can you trust the parameter error you are plotting?

If its overfitting, you should use a simpler model or add more data.

0

EM algorithms are not guaranteed to minimize negative log likelihood successfully. You can get stuck on a local minimum or even a saddle point and get weird results. You should repeat the EM algorithm for many different random starting points and see if you always get the same results.

Chris
  • 181