I've read that batch normalization eliminates the need for a bias vector in neural networks, since it introduces a shift parameter that functions similarly as a bias. As far as I'm aware though, a bias term works on a per-node level whereas the shift parameter of batch normalization is applied to all the activations. For instance in convolutional neural networks, the bias vector can be seen as an extra receptive field input of which the input is always 1. This effectively shifts each individual activation as opposed to shifting all activations at once.
My question thus is: is it true that batch normalization eliminates the need for a bias vector?
