The theory is different than the practice. Basically, how the algorithms were described in theory is not always respected in practice. In the case of scikit-learn, the decision trees are implemented considering only numerical features. This is wrong, or at least, not complete, since for nominal variables you have different ways to find splits than for numerical variables.
A short example. Numerical variables are first sorted and walking through the sorted values, new tests are created and evaluated. Tests on numerical inputs have the form: to the left all values greater than equal with a possible split value, to the right the other. For nominal values there are various tests, one of them has the form: on the left goes instances with one category, on the right the rest of the instances.
One can always transform a nominal variable into a numeric one using an encoding of some sort. This is fine as long as you do not do some math with the encoded values. But the problem with the implementation from scikit-learn is that it will use the logic for numerical variables on split, applied on the numerical encoding of a nominal variable. Thus, you really can't reproduce the behavior for nominals that I had given to you as an example.
It received also consideration the one hot encoding alternative. This means you will transform a nominal variable into multiple binary variable (one for each level of the nominal variable). Treating numeric encoding of nominal variables or one hot encoding might produce results (no doubt of that). The point is that they produces different results than the algorithm specified in theory since they do not fully implement it.
There are various ML libraries, most of them having support for treating categorical variables (R names them factors, Weka names them nominals, etc).
[Later edit: address comments from Soren and Candic3]
As far as I know specification from Breiman on random forests considers binary splits on nominal variables from all groups of variables combinations. As Soren said, this are $2^{L-1}$. This is too much. However there are two things which can be said about that.
First one is that you don't have to follow this recipe for nominal. You can search for binary splits on nominal variables of the form one vs all. These are short in number, even for variables with many levels. One can say that you can transform into binary indicators with some encoding. This is true, but not complete, since you have to deal somehow with missing values, which is not an easy job with encoding. Using surrogates or weighted random is impossible. Somehow you must do variable imputation before the encoding takes place, or consider missing as another category.
The second thing is that in case of binary classification for Gini impurity there are proofs that shows that the best split from all $2^{L-1}$ is found much faster if you use the following procedure. Compute purity function for each level subset. Sort variables by the value of impurity function computed on each subset. Then, compute binary splits of subsets where on one side you have first $k$ variables sorted previously, and in the other group the remaining values. This procedure is computed in linear time so, is much efficient and can actually handle variables with more than 32 levels. For details on that see Elements of Statistical Learning, 2ed - page 310, section 9.2.4 Other Issues. You will find there an explanation and following references.
My personal conclusion in topics like that is the following. If possible use categorical treatment. This is sometimes possible even for large number of levels. However for large number of levels you should proceed with caution. Too much levels can be translated in low predictive power. So maybe a grouping of the levels gives more than use directly the variable.
For example if you have zip codes. The number of levels can be huge. A direct treatment is possible, but perhaps some decoding of zip codes into regions has an increased level of generality and thus more predictive power.
My thinking is that too much granularity in nominal levels should be analyzed and solved in a more interpretative way that directly by programming tricks.