28

Google Prediction API is a cloud service where user can submit some training data to train some mysterious classifier and later ask it to classify incoming data, for instance to implement spam filters or predict user preferences.

But what is behind the scenes?

  • 2
    I suspect they're hoping to keep that commercially confidential! – onestop Jan 16 '11 at 14:11
  • This may be true, yet the video (from summer 2010) suggest that they had been still experimenting by that time; so I posted this Q hoping that some leaks appeared since then. –  Jan 17 '11 at 11:29
  • 6
    There are "several" algorithms that the Prediction API can choose from when training/predicting your data. The engine chooses the one that it decides is best.

    Some users have requested a little more control over that selection, http://goo.gl/mod/5EoA, even if the algorithm is unknown.

    Redditors have speculated on the guts here, http://www.reddit.com/r/MachineLearning/comments/evdxb/what_are_your_thoughts_on_google_prediction_api/, but the stat-speak is lost on me.

    – hyperslug Jan 23 '11 at 16:40
  • 2
    @hyperslug Post it as an answer, it is quite useful so I'd like to accept it. –  Jan 23 '11 at 16:48

1 Answers1

11

Google is using different machine learning techniques and algorithm for training and prediction. The strategies for large-scale supervised learning: 1. Sub-sample 2. Embarrassingly parallelize some algorithms 3. Distributed gradient descent 4. Majority Vote 5. Parameter mixture 6. Iterative parameter mixture

They should train and predict the model with the different machine learning techniques and using an algorithm to decide the best model and prediction to return.

  1. Sub-sampling provides inferior performance
  2. Parameter mixture improves, but not as good as all data
  3. Distributed algorithms return better classifiers quicker
  4. Iterative parameter mixture achieves as good as all data

But of course it is not really clear in the API documentation.