1

I had this question when I read equation (C.20) in Appendix C of "Pattern Recognition and Machine Learning" written by Christopher M. Bishop. Here I copy the equation below for reference:

enter image description here

Capital bold $\mathbf A$ and $\mathbf B$ are matrices; so is their product. Lower case bold $\mathbf x$ is a column vector, according to the convention in this book. But, what is the derivative of a matrix with regard to a vector defined, like $\frac{\partial\mathbf A}{\partial\mathbf x}$? It seems that $\frac{\partial\mathbf A}{\partial\mathbf x}$ is still a matrix of the same size as $\mathbf A$. Otherwise, the subsequent matrix multiplication with $\mathbf B$ can't be conducted. But what are the elements of $\frac{\partial\mathbf A}{\partial\mathbf x}$? I have googled and read wiki page like this, but that page does not cover such a derivative type. If you happened to read this book before, can you please let me know the definition of the derivative of a matrix with regard to a vector in this equation? Thanks a lot.

zzzhhh
  • 323
  • 1
    Check out this thread https://math.stackexchange.com/questions/822068/derivative-of-a-matrix-with-respect-to-a-vector – DevD Jun 29 '22 at 08:26
  • Related math.SE question on the product of a tensor and a matrix: https://math.stackexchange.com/questions/1953185/tensors-and-matrices-multiplication – B.Liu Jun 29 '22 at 08:59
  • So, for $\frac{\partial\mathbf A}{\partial\mathbf x}$, the author here actually means a tensor, and the subsequent multiplication with B is an n-mode product? – zzzhhh Jun 29 '22 at 09:18
  • 1
    See https://stats.stackexchange.com/questions/257579 for a general answer. The notation indicates $A$ and $B$ are matrix-valued functions. The derivative is defined by considering an $m\times n$ matrix as an element of $\mathbb{R}^{mn}$ and $x$ as an element of $\mathbb{R}^k.$ That makes "$A$" a function from $\mathbb{R}^p$ to $\mathbb{R}^{mn}$ and $B:\mathbb{R}^p\to\mathbb{R}^{np},$ say; and therefore $AB:\mathbb{R}^{mn}\times\mathbb{R}^{np}\to\mathbb{R}^{mp}.$ From this point on, you may consult any textbook on multivariable Calculus for definitions, examples, theorems, proofs, and so on. – whuber Jun 29 '22 at 15:01
  • @whuber ♦ So, $\frac{\partial\mathbf A}{\partial\mathbf x}$ should be a matrix, right? To be able to conduct matrix multiplications $\mathbf{AB}$ and $\frac{\partial\mathbf A}{\partial\mathbf x}\mathbf B$, the width of matrix $\frac{\partial\mathbf A}{\partial\mathbf x}$ must be the same as the width of matrix $\mathbf A$ and height of matrix $\mathbf B$ (n). But your definition of differentiability requires the width of $\frac{\partial\mathbf A}{\partial\mathbf x}$ be the dimension of $\mathbf x$ (k), which in general does not equal the width of A (n). How to reconcile this contradiction? – zzzhhh Jul 05 '22 at 09:11
  • There's no contradiction. Everything is in order and conforms with standard definitions and concepts. At any point $x\in\mathbb{R}^k,$ $DA(x)$ will be a linear transformation from a $k$-dimensional vector space to the vector space of $m\times n$ matrices. When you choose coordinates, that means $DA(x)$ will have $k\times mn=kmn$ components (all potentially depending on $x$). – whuber Jul 05 '22 at 13:24
  • @whuber ♦ What's the size of matrix $\frac{\partial\mathbf A}{\partial\mathbf x}$ then? What are the elements in this matrix? How is the matrix multiplication $\frac{\partial\mathbf A}{\partial\mathbf x}\bf B$ conducted? – zzzhhh Jul 06 '22 at 09:24
  • The answer to which I referred you characterizes $DA$ as a linear form defined in terms of the usual matrix multiplication. The meaning of $(\partial A/\partial x)B$ is, however, obscure and must be figured out by working through the details. (Start by taking $x$ to have just one component.) Avoiding this potential for confusion is one of the (many) reasons why I presented a more rigorous theory and notation in that answer. – whuber Jul 06 '22 at 13:16

0 Answers0