5

Goldman Sachs decided to apply their brains to predicting the winner of the World Soccer cup. They built a statistical model analyzing past winners.

I read about it here,

but the original version is in here.

But I wonder why would results of past world cups be relevant at all?

It is "a study of all mandatory international football matches since 1960 taking into account goals scored and conceded."

I wonder whether including past results up to 1960 is just a joke or a blunder (similar to their other blunders). Otherwise, how can you analyze this statistically?

Andre Silva
  • 3,080
  • This might be a little bit too vague/broad of a question. Also, you should consider quoting relevant excerpts instead of merely copy-pasting a link. – Patrick Coulombe May 31 '14 at 20:02
  • @PatrickCoulombe: OK, I'd include the details that I find fishy. – Jose Maria Converto May 31 '14 at 20:23
  • of course you can analyse this statistically. using a poisson model is a popular approch for soccer right now in the sports analytics community.

    those models tend to have inferior discrimination and calibration compared to the predictions as implied by the bookmaker odds (unsurprisingly, the latter use more information)

    – CloseToC May 31 '14 at 23:47

1 Answers1

2

On page 3 the statistical approach is spelled out.

They do a Poisson auto-regression of goals scored for each team with 6 explanatory variables. The autoregressive portion only goes back 10 games for the team in question and 5 for the opposing team. Perhaps you are not familiar with autoregression.

Autoregression means that a regression at $t$ is performed using previous values (in this case goals) going back some number of timesteps $p$. A simple linear autoregression would look like this.

$$ X_t=c + \sum^p_{i=1}\beta X_{t-i} + \epsilon_t $$

What is commonly referred to as Poisson regression is often linear so in that case the $X_t$ would be replaced with $log(X_t)$. The Poisson regression used isn't stated however and may be non-linear. Regardless the principle of autoregression would be true no matter what model they used; the values from previous timesteps are used as predictive variables for the current timestep.

For this model $p=10$ for the team of interest and $p=5$ for the opposing team. The fact that the data goes back to 1960 only means that there is some $k$ for which the $k-10$th game happened in 1960 and in addition there is a $k+10$th game that is not in 1960 (probably 1961).

The model is not claiming that an event that happened 54 years ago will affect the outcome of the World Cup this year but rather they are offering the premise that there is a pattern that was true in 1960 that involved the 10 previous games at most and that this pattern is still true today.

Whether or not this premise is fishy is perhaps a more sophisticated question.

Meadowlark Bradsher
  • 1,033
  • 11
  • 23