27

A potential issue with automated trading systems, that are based on Machine Learning (ML) and/or Artificial Intelligence (AI), is the difficulty of assessing the risk of a trade. An ML/AI algorithm may analyze thousands of parameters in order to come up with a trading decision and applying standard risk management practices might interfere with the algorithms by overriding the algorithm's decision.

What are some basic methodologies for applying risk management to ML/AI-based automated trading systems without hampering the decision of the underlying algorithm(s)?

Update:
An example system would be: Genetic Programming algorithm that produces trading agents. The most profitable agent in the population is used to produce a short/long signal (usually without a confidence interval).

Kiril
  • 905
  • 9
  • 11
  • As it stands, I think this question cannot be answered. Please give more detail about the kind of "ML/AI-based trading system" that you're envisioning. Does it just say whether to go long/short, does it give a confidence interval, etc.? What kind of model? Is it really black-box (because many ML models can be interpreted)? – Shane Feb 01 '11 at 00:22
  • 1
    @Shane, I've updated the question... I think that a Genetic Programming model would be black-box, since the resulting "trading agents" are often difficult to understand (i.e. they contain "junk DNA" which usually occurs with evolution). – Kiril Feb 01 '11 at 00:49

4 Answers4

20

ML/AI systems are susceptible to a number of risks not traditionally discussed in risk management:

  1. What I call 'backtest arbitrage'. In the process of automated model generation and testing, your machine learner may discover, exploit, and concentrate on irregularities in your backtesting system which do not exist in the real world. If, for example, your fill simulation is erroneous, you have not accounted for borrow costs, forgot to deal with dividends properly, etc., sufficiently powerful search techniques will find strategies which capture these nonexistent 'arbs'.
  2. If you sequentially generate, test, and refine many trading models, you run into the problem of 'datamining bias'. Here one has used the same data to simultaneously select the best model and estimate its performance via backtest. The estimate will be positively biased, and the size of the bias can be difficult to estimate if one has not kept careful track of all the strategies tested.
  3. Blackbox models are often subject to non-stationarities of the 'Grue and Bleen variety'. That is they may behave radically different out of sample due to non-stationarities of their input data and discontinuities in their processing of input data. An example would be an AI strategy which first checks if VIX is above 60, then trades one substrategy, otherwise it trades a different one. Your backtest period may contain little data in the 'over 60' regime, and you may find yourself in such a regime.

Regrettably many of these issues exist at the human level, and there is little one can do statistically to detect them or correct for them. They require great attention to process.

shabbychef
  • 2,836
  • 4
  • 26
  • 31
  • @shabbychef, I was actually asking for risk management of the trades generated by the ML/AI systems, not the risks involved with developing such systems. – Kiril Feb 01 '11 at 06:10
  • 1
    @Lirik these are very serious risks. Beyond that, I am not sure one will have much success e.g. trying to reverse-engineer a black box ML/AI system in order to detect when it has gone haywire. If you are just receiving the trades out of the thing, I am afraid there is not much one can do beyond checking concentration limits and leverage constraints. – shabbychef Feb 01 '11 at 23:12
  • @shabbychef I think online machine learning can mitigate those risks for the most part. – Kiril Feb 02 '11 at 00:47
  • @Lirik I doubt that could be the case. Backtest arb can only be mitigated by higher fidelity simulation and good coding. Grue and Bleen can perhaps be tackled by choice of algorithm. The datamining bias, however, remains. Whenever you use the same data to select your strategy and evaluate its performance, you are subject to this bias. If your online ML algo is entirely without knobs and it works the first time you run it, more power to you; otherwise, there will be a sequential process of fiddling with it until it 'looks good' at which point you have your bias. – shabbychef Feb 02 '11 at 03:31
  • 1
    @shabbychef, online machine learning eliminates the use of back-testing. The point of online machine learning is that you never stop learning: your machine learner constantly generates viable candidates (i.e. strategies) as you continue to feed it the latest market data. This also nearly eliminates datamining bias as you don't have a fixed data set on which you can overfit your ML, the data set is constantly changing. – Kiril Feb 02 '11 at 05:34
  • @Lirik, I doubt anyone is going to trade a magical black box without some estimate of its performance going forward. You do this via backtesting. As I said earlier, either 1) all your code works great the first time you try it, and the backtest looks acceptable (gold star for you) or 2) you keep sequentially refining it and eventually trade on the one that 'performed the best'. You now have datamining bias. Yes, one can train the models in-sample, then trade them out of sample, with rolling retrain, etc, but my statement stands when looking at the system as a whole: get it right 1st time or... – shabbychef Feb 02 '11 at 06:02
  • @shabbychef, like I said: there is no backtesting! If you hire a trader you don't backtest the trader to see how they're going to do, instead you let them trade with a paper trading account and once you see that they're doing good you actually let them trade with real money. You take the same approach with ML: you train your ML with a rolling training set, once you reach sufficient performance you start paper trading and that is effectively your "backtest." The training does not stop even if you start trading with real money, that's why you need a separate measure for risk. – Kiril Feb 02 '11 at 18:33
  • @Lirik: actually what you really do is paper trade a whole cadre of such agents, then on some date you pick the best one and 'switch it on' with real money. At which point you have datamining bias. Unless, as I have said previously, your system is completely without knobs, has the right data, does not require 'featurization' of said data, and works the first time, you spawn a whole stable of these things, or sequentially fiddle with them until they 'look good'. It doesn't matter what your testing looks like, whenever you use the same data to select and evaluate, you have this bias. period. – shabbychef Feb 02 '11 at 22:02
  • @shabbychef, OK, you're thinking of a completely different architecture for an ML system... – Kiril Feb 02 '11 at 23:58
  • 3
    @Lirik: no, I am thinking about what happens after 6 months of paper trading with mediocre results. Does one give up on finance and become a plumber? Or does one fiddle around with the algorithm, the data, etc? You always have only the data you have today, when you are deciding what to trade tomorrow. If you have any choice and the historical data guides that choice, you have datamining bias. – shabbychef Feb 03 '11 at 05:23
  • @shabbychef: If it was easy, everybody would be doing it... some people do give up. The assumption is that your trader (or algorithm in this case) is pretty good, but you need proper risk management to establish correct lot sizes, track exposure and controlling losses. The risk management you're talking about does not address those things. – Kiril Feb 03 '11 at 06:13
  • I think you are both right, the difference is one of degree. Anything you do involving any historical does introduce bias ... even a model you decide not to use because it performs horribly on out of sample backtests has introduced a bias by removing one bad model from your search space. However, the assumption underlying an ML based strategy is that you can become biased toward a better model. Lirik's question is about controlling the risk this introduces, rather than eliminating it totally. – Dan Nov 30 '11 at 19:26
  • Just my opinion, but I think Shabby is right here and giving you very sound advice. If you think on-line learning is some magic armor or something that makes it so you don't have to worry about these biases, then you will likely get a bad surprise when your technique doesn't work very well. Focus on getting a system that works under very rigorous and careful back-testing, and you won't need to worry so much about "risk management". But the concept that you can somehow throw a model into the world and have it just work without introducing any of these biases strikes me as pure fantasy. – Doodles Mar 13 '12 at 17:40
  • @shabbychef does bootstrapping the data or adding noise to the data (as a regularization) a good way to prevent data mining bias? – wh0 Feb 26 '13 at 14:33
  • @shabbychef moreover, if I use a ML method, as long as I have a way to represent the strategy, like GP. I will select one with shortest length of representation with acceptable backtest profit. – wh0 Feb 26 '13 at 14:36
  • @wonghang at the current point in time you have to decide how to deploy your money over the next time delta. You can use all the data available to you to both select the best model and estimate its performance. If you do so, your estimate of performance is biased upwards 'by selection.'

    You can instead partition the data into two sets, one for selection, the other for estimation. This increases the chance of making a selection error and increases the standard error on your performance measure.

    Representation length can easily be confounded by introns, BTW ...

    – shabbychef Feb 26 '13 at 17:43
13

The risks involved in trading is everywhere and always a multifaceted thing: it includes the volatility of the selected asset, the leverage and concentration of the porfolio, whether there is a stop loss, a hedge, etc. Also, risk management is frequently not tied to the "alpha model" directly (e.g. VaR, shortfall, and scenario testing).

For instance, one well known way of sizing a position is the Kelly formula:

$f^{*} = \frac{bp - q}{b}$

This makes no assumptions about the directional model that is used to enter the position. You can infer the values (e.g. probability of winning) from a historical simulation, regardless of whether the model is black-box, grey-box, or white-box.

Shane
  • 9,225
  • 4
  • 51
  • 56
3

It depends on what the strategy does.

For a long/short signal on an equity symbol, one way is to look at the options prices / implied volatility for that symbol. Your system should give an expected timeframe and profitability, so the risk involved could be quantified by the price of buying options to insure yourself against losses compared to your expected returns.

For a more complicated symbol, you can attempt to approximate using a basket of options.

For a short-term microstructure trade (which is IMHO the arena in which ML/AI-type strategies are most useful, mainly because proper quant analysis often has little to say), there is very little you can in terms of principled risk estimates. Rather, you must rely on simulation, and backtesting. I especially recommend adding in simulations for totally disastrous fictional scenarios. For this kind of trade, any estimation is rough at best, so use a healthy dose of pessimism and superstition.

Dan
  • 476
  • 3
  • 2
1

The risk is not linked to the decision process but to your inventory, independently from the signals that triggered the buys and sells: you can monitor the inventory as usual.

If you are talking of taking into account the fact that you change your inventory more often because you use computer-based signals, it is more complex. You need to control the dynamics of your trading algorithm to be sure that it will not take positions in one millisecond that will dramatically increase your risk.

lehalle
  • 12,064
  • 1
  • 49
  • 89