6

I don't really understand why we minimise a cost function for gradient descent. Why don't we try to have something like a gradient 'climb', where we maximise some function?

Is it due to convention, or are there some properties which make minimising much better for optimisation than maximising functions?

A similar question was asked here, but I don't feel like the answers really address my question directly, and in a way which I understand.

User1865345
  • 8,202
Y-MinG
  • 61

0 Answers0