What makes training time longer with bigger parameter size in a deep learning model?

Question

I try to understand, is it always the case with the more parameter you trained, the more training time you need when training a deep learning model. For example, i have a CNN model for text classification. I'm comparing two embedding (word2vec and fasttext) with different embedding dimensions, 512 for word2vec and 300 for fasttext. The result is model with word2vec has more parameter size, and the training time is also longer. Is this always the case? Simply because of the number of the parameter? Because the CNN architecture is the same, the only difference is the word embedding used.

What do you mean by training time? Training time is determined by some criterion to stop. If your model with smaller parameter size is unable to reach that criterion, then it might never stop and have infinite training time. Sometimes the criterion is to stop when improvements are small, and then the training can be over very quickly. If I have a very large model then training might stop because the computer runs out of memory. It might help the question when you specify this more precisely. — Sextus Empiricus, Jan 15 '24 at 09:48
A similar issue occurs in this question https://stats.stackexchange.com/questions/629714 it is not about the question whether training time can get shorter but similar principles may play a role (and in the comments I noted that training time can decrease for a more complex model). — Sextus Empiricus, Jan 15 '24 at 09:50

score 1 · Answer 1 · answered Aug 06 '21 at 07:54

It depends on the initialization point, your configuration etc as well, but no surprise it's usually the case. Number of parameters increases the dimensionality of the parameter space you search for, and it's less likely to fall in to the right place if you are in a larger space, e.g. think about a point in 2D vs 3D spaces, which one is easier to get near when you're randomly assigned a coordinate? Moreover, more parameters/inputs means more calculations, so, no wonder if it increases the training time.

score 0 · Answer 2 · answered Jan 15 '24 at 10:25

is it always the case with the more parameter you trained, the more training time you need when training a deep learning model.

No

See for an example Sycorax's answer here to a question "Aren't my iterations needed to train NN for XOR with MSE < 0.001 too high?".

But generally speaking, more parameters, requires more computation effort. If there is an improvement then this comes from the network working better. (In the example it helps to avoid a local optimum that is difficult to escape)

What makes training time longer with bigger parameter size in a deep learning model?

2 Answers2