Google is using different machine learning techniques and algorithm for training and prediction. The strategies for large-scale supervised learning:
1. Sub-sample
2. Embarrassingly parallelize some algorithms
3. Distributed gradient descent
4. Majority Vote
5. Parameter mixture
6. Iterative parameter mixture
They should train and predict the model with the different machine learning techniques and using an algorithm to decide the best model and prediction to return.
- Sub-sampling provides inferior performance
- Parameter mixture improves, but not as good as all data
- Distributed algorithms return better classifiers quicker
- Iterative parameter mixture achieves as good as all data
But of course it is not really clear in the API documentation.
Some users have requested a little more control over that selection, http://goo.gl/mod/5EoA, even if the algorithm is unknown.
Redditors have speculated on the guts here, http://www.reddit.com/r/MachineLearning/comments/evdxb/what_are_your_thoughts_on_google_prediction_api/, but the stat-speak is lost on me.
– hyperslug Jan 23 '11 at 16:40