While doing logistic regression, it is common practice to use one-hot vectors as desired result. So, no of classes = no of nodes in output layer. We don't use index of word in vocabulary because that may falsely indicate closeness of two classes. But why can't we use binary numbers instead of one-hot vectors?
i.e if there are 4 classes, we can represent each class as 00,01,10,11 resulting in log(no of classes) nodes in output layer.
