11

This may as well go down as the silliest questions ever asked on this forum, but having received sound and meaningful answers to a previous question, i thought i will stretch my luck again.

I have been very confused for some time on the importance of statistical distributions especially as they relate to asset returns and even more specifically in asset allocation.

My question to be specific is this: Assume i have 20 years of S&P 500 monthly returns data, why should i need to assume a certain kind of distribution (i.e Normal/Johnson/Levy flight etc) for my asset allocation decision when i can simply just make my asset allocation decisions based on the historical data i have with me?

Bloodline
  • 133
  • 3
    remember that if you found answers to your previous question helpful, you can mark them as 'accepted' by clicking on the checkbox next to the answer. this lets others know your question is solved. – Jeff Oct 17 '12 at 20:21
  • 2
    There actually is a recent post from J.D.Cook on that subject. To outline its relevance to your question, I will quote from the first paragraph "When statisticians analyze data, they don’t just by look at the data you bring to them. They also consider hypothetical data that you could have brought. In other words, they consider what could have happened as well as what actually did happen." – user603 Oct 17 '12 at 20:40
  • I believe Taleb had something cogent to say about the problems with making decisions solely from historical data :-). (Historical data usually do not directly reveal the rare but possibly fatal "black swan" events until it's too late.) – whuber Oct 17 '12 at 21:12
  • 2
    ... as most turkeys will come to realize in a couple of weeks. – Ryogi Oct 17 '12 at 21:20
  • To expand on @user603 's point - you want to make inferences outside your sample. In particular, the point of your asset allocation relates to future behavior, not past behavior. This includes, for example, how things behave in the tail, where you have few observations. You can bring in additional knowledge/understanding/biases about the process via distributional assumptions. If these assumptions are somewhere close to right you can add a lot of information. – Glen_b Oct 18 '12 at 01:48

2 Answers2

5

Using an assumed distribution (ie. parametric analysis) will reduce the computational cost of your method. I am assuming that you would like to perform a regression or classification task. This means that at some point you are going to estimate the distribution of some data. Nonparametric methods are useful when the data does not conform to a well studied distribution, but they typically take either more time to compute or more memory to store.

Also if the data are generated by a process that conforms to a distribution, such as they are an average of some uniformly random processes, then using that distribution makes more sense. In the case of averaging a set of uniform variable the correct distribution is probably the Gaussian Distribution.

James
  • 141
0

Complementing James answer: parametric models also (usually) require less samples in order to have a good fit: this may increase their generalization power: that is, they may predicted new data better, even being wrong. Of course, this depends in the situation, the models and the sample sizes.

madness
  • 117