When we encode categorical variables in a (generalized) linear model, the standard approach is to have one category subsumed by the intercept.
If we include such a variable in a neural network (along with other features), should we avoid this in order to allow all categories to enter the nonlinear activation functions and interactions with other features, or do we get away with allowing those for all but one category and then comparing to the remaining category, as we do in a (generalized) linear model?
In my mind, I am thinking of a feed-forward neural network, perhaps even just one with the input features followed by a hidden layer followed by an output, but I am happy to see a discussion for more sophisticated architectures like convolutional, recurrent, LSTM, etc.