How to tune hyperparameters/architecture of networks that are expensive to train?

Question

What are some recommended ways to tune hyperparameters and/or develop domain-specific architectures for a large neural network model? That is, how to further tune a large neural network that already takes a long time to train?

My question is two-fold (on hyperparameters and architecture):

The standard seems to be bayesian hyperparameter optimization but even so, this optimization can take a long time if one wishes to take advantage of the sequential optimization in this method (optimizing based on prior knowledge).

There is also evolutionary methods for developing novel architectures but how can the average data scientist execute this method while avoiding expensive computational resources? (I am not too familiar with evolutionary algorithms)

Are there any clever approaches/techniques to further tune hyperparameters/develop domain-specific architectures for neural networks that already take a long time to train?

The suggestions here are generic – Sycorax Mar 20 '24 at 18:12 — Sycorax, Mar 20 '24 at 18:12

score 0 · Answer 1 · answered Apr 08 '19 at 07:10

There are several techniques related to this challenge of estimating the performance of a neural network without running the entire training procedure addressed by the multi-fidelity(quality of performance estimation) optimization literature. Some of them are:

Training on a subset of data
Training on lower resolution image if one is working on image datasets
Extrapolating learning curves

Two good references for this literature are below:

https://www.ml4aad.org/wp-content/uploads/2018/07/automl_book_draft_neural_architecture_search.pdf

https://github.com/automl/HpBandSter

How to tune hyperparameters/architecture of networks that are expensive to train?

1 Answers1