2

As opposed to

enter image description here

Robust Bayesian Allocation solves

enter image description here

see https://hudson-and-thames-portfoliolab.readthedocs-hosted.com/en/latest/bayesian/robust_bayesian_allocation.html.

I don't get it. Why is this a worthwhile problem to solve? It seems to just be the same problem as the first one, but under the assumption of maximal risk and minimal return. Which is nonsensical. Why would anyone allocate their portfolio based on what is optimal in the worst case scenario? That says nothing about how the portfolio may perform in the other scenarios, and since those other scenarios may occur, they must be taken into account.

saei
  • 101
  • 2
  • 2
    The problem this addresses is that $w,\Sigma$ are not known in real life. A more reasonable assumption therefore might be that we only know that $w \in \theta_W$ and $\Sigma \in \theta_{\Sigma}$. In other words we don't have point estimates for the paratmeters $w,\Sigma$, but we only know they belong to a set. – nbbo2 Mar 13 '21 at 14:00
  • Right, but then you build your portfolio based on them being equal to the worst possible values out of that set. If all points in the set are equally likely, why would you build your portfolio based on only one of those points? It would make more sense if the max-risk and min-return assumptions were replaced by averages. – saei Mar 13 '21 at 14:01
  • This is the philosophy of Robust Control (taken from the mathematical field of Control Theory): you should build a system so that it works well under the least favorable scenario. It clearly makes sense in the design of aircraft or nuclear power plants. Whether it makes sense for Investments, is up to you to decide. It is debatable. – nbbo2 Mar 13 '21 at 14:05
  • 1
    No, it is not debatable. At all. There's not a single investment professional walking the planet who allocates his or her capital based on "hurr durr this is the best portfolio in case the market crashes". – saei Mar 13 '21 at 14:12
  • 1
    You have identified the key problem: how to specify the parameter sets $\theta$ in a reasonable manner, otherwise we end up with a trivial solution: all in cash. Is this even possible. (BTW I agree with you that as an investment professional I do not find this approach convincing, so far). – nbbo2 Mar 13 '21 at 14:54

2 Answers2

1

I think the optimization problem you quote was clumsily assembled, but there is a sensible idea here. I can explain a variant, based on minimax Frequentist 'risk'. (The 'risk' here is risk of making a bad decision, not portfolio risk.) Suppose you will observe a matrix of historical returns, $X$ then create a portfolio based on that data, call it $w(X)$. The decision theoretical 'risk' associated with such a portfolio rule could be the negative Sharpe of that portfolio. (We want to minimize 'risk'.) Define $$ r(w) = - \frac{w(X)'\mu}{\sqrt{w(X)'\Sigma w(X)}}. $$ The minimax portfolio rule would be the one that solves the following optimization problem: $$ \min_{w \in \mathcal{W}, \,\left(\mu,\Sigma\right) \in \mathcal{B}} E_{X}\left[r(w)\right]. $$ That is, you seek to maximize the expected (under replications of $X$) true Sharpe ratio of the portfolio rule, over a certain set of allowable portfolio rules and a certain set of allowable true means and covariances.

The Bayesian form is somewhat similar, but I think this formulation illustrates the idea better: you are not tailoring your portfolio to a worst case scenario, you are tailoring your portfolio rule to a worst case.

BTW, it turns out that when the number of days of data dominates the number of assets, Markowitz is generally the optimal portfolio rule

steveo'america
  • 547
  • 3
  • 8
1

So let me begin with why both of these models concern me.

In the first model, the Frequentist Markowitzian model, Ito's method assumes that all parameters are known. No estimators are needed.

That is a big deal because White in 1958 proved that models with $\tilde{w}=Rw+\varepsilon, R>1$ have no solution if $R$ has to be estimated in the Frequentist paradigm. Although one could use methods like Theil's regression, it would be inconsistent with the economic theory. Ito's method is valid only if you know the true parameters for Apple Computer. I don't.

In the Bayesian framework, it is the parameters that are random and not data. It can take one of two forms, the objective and the subjective.

In the objective form, $\theta=k$, but is unobservable. A prior probability distribution is created by the observer regarding the location of $\theta$. If thought of in game theory, $\theta$ is chosen by nature at time zero. $\theta$ is a random variable if randomness is thought of as uncertainty rather than chance.

In the subjective form, $\theta\in{K}$ and nature draws $\theta$ at the beginning of each experiment. $\theta$ is not a constant, it is a true random variable.

If you look at the website you provided, Bayesian methods do not provide a single point answer as null hypothesis methods do. There is no equivalent to $\bar{x}$ or $s^2$. Instead, there is a distribution of possible values for $\mu$.

One can only arrive at single points by imposing a utility function, which is what they have done.

It is the interpretation of this that makes the solution problematic.

Under the subjective interpretation, you are anticipating that nature will make a very bad draw in the next time period. Indeed, you are anticipating, in some sense, the worst draw ever to have happened to be next.

Note that this is about the parameters and NOT the market realizations. It does not imply a crash. For example, let us assume that $\mu_{min}=1.01$, that does not preclude the realized value of $S_{t+1}$ from being a 95% increase. Because this is a systemic choice, what you are really doing, by implication, is assuming the very worst future set of economic conditions over the next period.

In the objective interpretation, every point in the posterior is a valid possible solution. Obviously, some points in the posterior as so improbable that a person could rule them out. The solution they provide is in the dense but improbable region.

In other words, it could be the real value for the parameters, but it is unlikely. Nonetheless, it is unlikely downward. In the worst case, that is likely to be the systematic outcomes, should you have a bad historical sample to build your posterior from. By choosing the worst case, you are likely to be always surprised.

The rough Frequentist equivalent would be to use the parameter estimates from the bottom of the 99.9% confidence intervals.

Every point in a confidence interval is equiprobable. Confidence intervals are a uniform distribution. If you had the correct utility function, then choosing the bottom of every confidence interval would be an equally valid solution.

As far as I can tell, there is nothing objectionable about plugging in the bottom value of a confidence interval instead of using the MVUE. While it still isn't a valid estimator, as per White, nothing makes it "wrong" to ignore the MVUE in favor of some point in an interval.

I have one other concern, but it would take a lot of work to determine its validity. By a lot of work, I mean I might solve it in an hour or it could take me weeks. I don't care that much, so I am not going to work on it, but I am providing it as a disclosure.

Axiomatically, there is only one difference between Kolmogorov's axioms and de Finetti's Dutch Book Theorem. It has to do with the counting of sets. It turns out, it is a big deal.

Under de Finetti's axiomatization of probability, which guarantees that market makers cannot be forced to lose money, sets have to be finitely additive. Under Kolmogorov's, sets must be countably additive.

For example, one of the hidden assumptions deep in the background math when solving things such as the minimum variance unbiased estimator for the population mean is that the distribution of the random variable can be cut into as many pieces as there are integers when dealing with a continuous random variable.

Formally, a set function $\mu$ possesses countable additivity if, given any countable disjoint collections of sets $\{E_k\}_{k=1}^n$ on which $\mu$ is defined $$\mu\left(\bigcup_{k=1}^\infty{E_k}\right)=\sum_{k=1}^\infty\mu(E_k).$$

An informal way to think about it is to imagine that you cut the normal distribution into segments one unit in size in both directions. That would be an infinite number of disjoint sets over a continuous distribution.

Bayesian methods are not countably additive. Instead, you may only cut such a set $n$ ways. You may not take $n$ to the limit at infinity.

As counterintuitive as it may sound, when placing money at risk, the difference is gigantic.

I believe that Leonard Jimmie Savage created the following analogy.

Imagine that you had an urn with $n$ lottery tickets in it. As a bookie or market maker, you could make rational decisions about how to price each ticket in a sensible manner.

Now imagine an urn with all of the integers in it. How could you sensibly price the risk for any ticket?

Frequentist methods produce solutions that allow someone who is clever to string together a convex combination of contracts such that they cannot lose money. Indeed, such contracts are often self-funding or payout amounts greater than the cost of funds.

If your counterparty uses a Frequentist method for things such as asset allocation, and you know how to do it, you can construct a riskless portfolio of contracts. It is the mathematical equivalence to color blindness. The Frequentist is effectively color blinded by the assumption of countable additivity.

To use an analogy, imagine there was a device that guaranteed a payout when a light was blue and never paid when green. You are color blind so you believe that chance is involved. The other party is not. They think you are insane or, at least, individually irrational.

However, using a Bayesian method is not a sufficient condition to produce a coherent result, that is a result that cannot be gamed by a clever counterparty.

Some prior distributions and some utility functions can create incoherent pricing.

You are safe if you use your real subjective proper priors from information outside the data set and if you use your personal utility function. It is when you start building artificial utility functions or priors that the research shows you can be forced into taking losses.

My concern is that this is a minimax solution and minimax solutions are not coherent, generally. At the minimum, I could probably create a statistical arbitrage case against you. On the other hand, it is unlikely that you would care. The use of this utility function implies that you are very conservative and afraid. You would willingly give up return for safety.

The open question is whether or not I could find a way to force you to lose money.

Dave Harris
  • 4,389
  • 11
  • 23
  • Every point in a confidence interval is equiprobable. Is that so? Well, each point has probability zero, so yes. But this does not seem to hold for subintervals instead of points. Take a 90% confidence interval and a 95% confidence interval for the same unknown parameter. The length ratio will not be 18:19. Thus subintervals of equal length within a $(1-\alpha)%$ confidence interval are generally not equiprobable. (I have written this a bit hastily, but I hope my point is more or less clear.) – Richard Hardy May 07 '21 at 19:35
  • @RichardHardy If the null hypothesis is true, then the probabilities in an interval are uniformly distributed. It is counter-intuitive. You are correct, the length ratio will not be 18:19. See Morey, R.D., Hoekstra, R., Rouder, J.N. et al. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev 23, 103–123 (2016). – Dave Harris May 07 '21 at 22:48
  • @RichardHardy there is an error in my comment. The uniformity of the interval is independent of the truth of the null. Maybe a better way to think about it rather than in terms of a probability, since a 95% confidence interval does not imply there is a 95% chance the parameter is in the interval, it may be better to say that you have equal confidence in every point in the interval as to it being the parameter. – Dave Harris May 07 '21 at 23:04
  • I read Morey et al. (2016) a while ago and found it illuminating. Perhaps it is time to read it again. Perhaps I am mixing the frequentist interpretation with the fiducial one. I like confidence distributions and the works of N.L.Hjort, T.Schweder and co, e.g. "Confidence distributions and related themes" (2017) and their textbook "Confidence, Likelihood, Probability" (2016). – Richard Hardy May 08 '21 at 10:00
  • My main criticism of Morey et al. (2016) is that they appear to use the term "probability" in both frequentist and Bayesian senses throughout the paper, switching between the two notions whenever it is convenient for them, but without being explicit about it. I find this a little misleading (for lack of a milder term). – Richard Hardy May 08 '21 at 10:03