1

Say we have a relationship $ z = Wx$ for a multi layer perceptron where $z$ and $x$ are $n$ dimensional vectors. When we find $\frac{dz}{dx}$ , I would assume this would just be $W$, not $W^T$.

I was wondering where this result comes from - I can't figure out why this makes sense.

bebop
  • 11
  • 2
    It's a convention. You can use either convention (numerator convention or denominator convention) as long as you're consistent. – Arya McCarthy Apr 10 '22 at 23:01

0 Answers0