3

Say I want to "visualize" in some way Random Forest (or make it implementable). All of my points come from the idea of fixing the seeds.

Let $z_1$ be the seed in the creation of boostrapped training set, and $z_2 $ be the seed in the selection of feature's subset (for simplification, I only list 2 kinds of seeds here).

  1. From $z_1$, $m$ boostrapped training sets are created: $D_1(z_1)$, $D_2(z_1)$, $D_3(z_1)$, ..., $D_m(z_1)$.
  2. From those traning sets, $m$ corresponding decision trees are created, and tuned via cross-validation: $T_1(z_1,z_2)$, $T_2(z_1,z_2)$, $T_3(z_1,z_2)$,..., $T_m(z_1,z_2)$.
  3. Let's denote predictions from the ${j^\text{th}}_{(j=1,2,...,m)}$ tree for an individual $x_i$ (from training or testing set, whatever) as $\hat{f}^j(x_i)_{(i \le n, j \le m)}$. Hence the final predictions by the ensemble trees are: $$\hat{F}(x_i) = \frac{1}{m}\sum\limits_{j=1}^m \hat{f}^j(x_i)$$
  4. Once the model is validated, and is stable (meaning $\hat{F}(x_i)$ doesn't depend strongly on the pair $(z_1,z_2)$). I start to create every possible combinations of my features, which give me a very big set ($x'_i$).
  5. Applying my forest on each $x'_i$ gives me the corresponding predictions: $$x'_1 \rightarrow \hat{F}(x'_1) \text{ - which is fixed thanks to $(z_1, z_2)$}$$ $$x'_2 \rightarrow \hat{F}(x'_2) \text{ - which is fixed thanks to $(z_1, z_2)$}$$ $$x'_3 \rightarrow \hat{F}(x'_3) \text{ - which is fixed thanks to $(z_1, z_2)$}$$ $$x'_4 \rightarrow \hat{F}(x'_4) \text{ - which is fixed thanks to $(z_1, z_2)$}$$ $$....$$
  6. The latter can be easily represented in form of a single (huge) tree. For example: $x'_1$: (Age = 18, sex = M, ...), $x'_2$ = (Age = 18, sex = F, ...), ... could be regrouped to create a leaf.

This works also for every ensemble methods based on aggregation of trees.

It will be computationally expensive, but is there any thing wrong with this approach?

Metariat
  • 2,526
  • 4
  • 24
  • 43
  • 2
    I don't understand what's happening in steps 4,5,6. Can you explain this in more detail? For example, what does it mean for predictions to be "the same for each case"? Do you mean that decisions must be fixed because fixing the seed makes the model deterministic? – Sycorax Jan 21 '16 at 16:10
  • @user777: yes, I edited the question! – Metariat Jan 21 '16 at 18:43
  • "not so random" forest? – Aksakal Jan 21 '16 at 18:48
  • 1
    Your problem is in step 5. Age = 18, sex = M can be price = 18 for one tree, price = 19 for another tree, and price =20 for another tree. The ensemble will collect these different values in some way, like an average, but all you're doing by fixing the seed is making the forest reproducible. – Sycorax Jan 21 '16 at 18:48
  • 3
    On general note, if your decision is dependent on the seed you have an issue. Ultimately the model is built to make a decision. Fixing the seed hides the problem. – Aksakal Jan 21 '16 at 18:54
  • @user777 the price listed there is the average! – Metariat Jan 21 '16 at 22:18
  • @aksakal: I didn't see any problem fixing the seed, the model is still random – Metariat Jan 21 '16 at 22:20
  • @Metallica That's incorrect. Given knowledge of the seed and the PRNG method used, the model is entirely deterministic! – Sycorax Jan 21 '16 at 22:29
  • So why do you want to fix the seed then? What's the problem with changing the seed? – Aksakal Jan 21 '16 at 22:45
  • @user777 aksakal you can say that the model is deterministic, but the predictions are not wrong. And once your model is stable, you can fix the seed, predictiobs is unchanged whatever the value of seed is. I ask this question because fixing the seed enables us to visualise and implement random forest in real life, not only on papers – Metariat Jan 21 '16 at 23:00

1 Answers1

2

In general, to boil down a forest to a single tree works really well for low noise step function shaped data structures... There will be a cost of increased model bias and/or variance for most practical problems, that led you to train a forest model in the first place. If you're lucky you end up with one or a few trees, that are adequately small/few to comprehend and adequately fit the data right. But many times the gap just will not be bridged. Here's an example: Learning accurate and interpretable models based on regularized random forests regression

Not sure if your seeding trick will work. Must confess I don't get it entirely :) But prove the world wrong by posting a prototype! Maybe you could write a randomForest-wrapper controlling seedings.

Anyways keep in mind the decision trees and the forest are just a representation of a model structure. Well yes, that specific representation matching how the model was built. *But, that does not mean, that trees are the best representation to convey the overall model structure. You could try invent an entirely new representation both true to model structure and easy to comprehend. To think out of the black-box!... well maybe :)

You can see your trained model structure as a mapping function connecting your feature space with your target. In the simple case of regression, the target space is simply a 1D numeric scale. This regression mapping function has a geometrical shape and can be visualized, with partial dependence plots as iceBOX, rMiner, randomForest::partialPlot. Sry for only mentioning R packages (what is used in python?). I wrote forestFloor also covering probabilistic classification, latent interaction dectections and quantification of how well a given low-dimensional 2D/3D visualization represents the true high-dimensional model structure.