It is known that when constructing a decision tree, we split the input variable exhaustively and find the 'best' split by statistical test approach or Impurity function approach.
My question is when we use a continuous variable as the input variable (only a few duplicated values), the number of possible splits could be very large, to find the 'best' split will be time-consuming. How would data scientist deal with it?
I have read some materials that people would do a clustering of levels of the input to limit the possible splits. (example). However, they don't explain how it is done. What do we base on to cluster an univariate variable? Are there any resources for more details or anyone can explain in details?
Thanks!