Questions tagged [cart]

'Classification And Regression Trees', also sometimes called 'decision trees'. CART is a popular machine learning technique, and it forms the basis for techniques like random forests and common implementations of gradient boosting machines.

CART stands for Classification And Regression Trees. This is a technique for developing a tree model (T) to predict categories (C) and/or continuous values (R) by recursive partitioning. It does not make restrictive parametric assumptions.

(Note that "CART" is a synecdoche for the general data mining technique of using decision trees to predict outcomes. Strictly speaking, "CART" refers to a specific algorithm for forming trees that was popularized by the work of Leo Breiman. However, CART is commonly used to refer to any predictive tree algorithm, and the tag may be used similarly on Cross Validated.)

1273 questions
40
votes
3 answers

Why are Decision Trees not computationally expensive?

In An Introduction to Statistical Learning with Applications in R, the authors write that fitting a decision tree is very fast, but this doesn't make sense to me. The algorithm has to go through every feature and partition it in every way possible…
DataOrc
  • 451
9
votes
1 answer

Decision Tree with continuous input variable

It is known that when constructing a decision tree, we split the input variable exhaustively and find the 'best' split by statistical test approach or Impurity function approach. My question is when we use a continuous variable as the input variable…
pe-perry
  • 842
6
votes
1 answer

Can two or more splits in a binary decision tree be made on the same variable?

My question is about a binary decision tree (binary to integer). Is there any problem if the conditions defined on a same variable ex. x1? I mean when i define the variables for my tree, can I choose: if(x1>3) then if (x1>4) then .... …
Michelle
  • 101
6
votes
1 answer

Decision trees and backward pruning

By building the complete tree and pruning it afterward we are adopting a strategy of postpruning (or backward pruning) rather than prepruning (or forward pruning). Prepruning would involve trying to decide during the tree-building process when to…
andreister
  • 3,357
5
votes
3 answers

Regression Tree with nested factors

I am working on a prediction model in which I have several factor variables that have many levels. These factor variables have a nested structure, in the form of a Category, a Sub-Category, and a Sub-Sub-Category. For example suppose that I had one…
5
votes
2 answers

Derivation of Gini Impurity Formula

There's a step in the Wikipedia article regarding the formulation of the Gini Impurity that I can't understand. They state that: I follow everything up until this point $1-\sum_{i=1}^Jf_i^2 = \sum_{i\ne k}f_if_k$ There is a related thread that…
ZachTurn
  • 195
5
votes
2 answers

Which algorithm can learn exactly a tree structure without noise?

Let's say I have a response variable $Y$ satisfying a complicated tree-structure: $$Y=f(X_1,X_2,X_3,\ldots,X_p) + \varepsilon$$ where: $\varepsilon = 0$ $f$ is a known deep tree-structure function (not necessarily binary). That is to say we can…
Metariat
  • 2,526
  • 4
  • 24
  • 43
4
votes
1 answer

Log transformation in CART analysis

I'm working as a liaison between a researcher and a stats team at a university. I'm a database admin who is working on using business intelligence tools to offer the option of (as of right now) offer CART analyses in .PDF form over the internet. …
4
votes
1 answer

optimal decision tree np-hard

Reading Elements of Statistical Learning and it says that decision trees are often constructed using greedy algorithms because it is computationally infeasible to create an optimal decision tree. There is a proof here but it relies upon several…
allstar
  • 459
4
votes
1 answer

what if a decision tree does not result in leaves with one class each?

A decision tree can result in leaf nodes that have samples from multiple classes. Is the algorithm at that point to simply vote on the class?
allstar
  • 459
4
votes
1 answer

CART analysis with multiple dependent variables

I want to do a CART analysis with multiple dependent variables. Which program is able to do that?
M C
  • 61
4
votes
2 answers

Why is CHAID (decision tree) analysis used in direct marketing? What makes it more suitable than other types of trees?

According to wikipedia CHAID is popular for modeling reponses in direct marketing (and I have seen it come up several times in this context). Does anyone know what it is that makes it suitable/preferred for this type of analysis? What benefit does…
L Xandor
  • 1,229
  • 2
  • 12
  • 16
4
votes
1 answer

When is classification error rate preferable when pruning decision trees?

I'm going through Chapter 8 of "Introduction to Statistical learning" which introduces decision trees. My question is specific to the three approaches to pruning a decision tree (i.e., classification error rate, Gini Index, and cross-entropy). With…
4
votes
2 answers

Decision Trees on training data

Wouldn't any decision tree trained on a training data set have no errors in classification? In other words, wouldn't every data point be classified correctly in the training data set? How would this tie in with the misclassification rate?
cartpool
  • 191
3
votes
1 answer

Decision tree for numeric dependent variable?

I have data on commute times over a specified route over different days during different conditions. Some of the conditions are categorical (e.g., weather, traffic), and some of them are numeric (e.g., time departing from origin). I'd like to find…
1
2 3 4 5