In cs231n course , it is mentioned that
If the initial weights are too large then most neurons would become saturated and the network will barely learn.
How do the neurons get saturated? Large weights may lead to a z (output of saturation) which is not cery close 0 or 1 and so doesn't let z*(1-z) to saturate

The issue is that the "saturation" behavior (i.e., slow increase/decrease) means that the gradient is small. Note that gradient is "the rate of change". Saying "the rate of change is slow" is equivalent to saying "small gradient value"
– MohamedEzz Feb 11 '20 at 05:13