我无法复制并粘贴这行代码到我的 overleaf 中

我无法复制并粘贴这行代码到我的 overleaf 中
$
\def\o{{\tt1}}
\def\p{\partial}
\def\LR#1{\left(#1\right)}
\def\op#1{\operatorname{#1}}
\def\trace#1{\op{Tr}\LR{#1}}
\def\qiq{\quad\implies\quad}
\def\grad#1#2{\frac{\p #1}{\p #2}}
\def\c#1{\color{red}{#1}}
\def\gradLR#1#2{\LR{\grad{#1}{#2}}}
$As you have discovered, the chain rule can be difficult to apply in Matrix Calculus because it involves higher-order tensors (i.e. matrix-by-vector, vector-by-matrix, and matrix-by-matrix gradients) which are difficult to calculate, awkward to manipulate, and don't fit into standard matrix notation.

Instead I would recommend a differential approach, because the differential of a matrix behaves like a matrix. In particular, it can be written using standard matrix notation and it obeys all of the rules of matrix algebra. 

Also, the $\c{\rm Frobenius}$ product is extraordinarily useful in Matrix Calculus
$$\eqalign{
A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\
A:A &= \|A\|^2_F \qquad \big\{{\rm \c{Frobenius}\;norm}\big\}\\
}$$
The properties of the underlying trace function allow the terms in such a
product to be rearranged in many equivalent ways, e.g. 
$$\eqalign{
A:B &= B:A \\
A:B &= A^T:B^T \\
C:\LR{AB} &= \LR{CB^T}:A \;=\; \LR{A^TC}:B \\\\
}$$


---


Using the above notation, the manipulation of your particular function becomes almost mechanical
$$\eqalign{
df &= \gradLR fZ:dZ \\
   &= \gradLR fZ:\LR{dX^TY} \\
   &= \gradLR fZ Y^T:dX^T \\
   &= Y\gradLR fZ^T:dX \\
\grad fX &= Y\gradLR fZ^T \\
}$$
Note that there was no need for a tensor-valued gradient in any step, just plain old matrices.

Also note that your initial dimension-matching approach was correct! That's not as crazy an idea as it seems. When dealing with *rectangular* matrices there is often only one way to fit all the pieces together and it's a useful shortcut. But it won't help when dealing with *square* matrices.

以上内容来自https://math.stackexchange.com/questions/4617988/matrix-derivative-of-fxt-ywrtx/4618182#4618182

由于 \eqalign,它在我的 overleaf 中失败了。

答案1

您需要使用 LaTeX 标记,因此类似

在此处输入图片描述

\documentclass{article}

\usepackage{amsmath,xcolor}
\DeclareMathOperator\trace{trace}
\begin{document}


As you have discovered, the chain rule can be difficult to apply in
Matrix Calculus because it involves higher-order tensors
(i.e. matrix-by-vector, vector-by-matrix, and matrix-by-matrix
gradients) which are difficult to calculate, awkward to manipulate,
and don't fit into standard matrix notation.

Instead I would recommend a differential approach, because the
differential of a matrix behaves like a matrix. In particular, it can
be written using standard matrix notation and it obeys all of the
rules of matrix algebra.

Also, the \textcolor{red}{Frobenius} product is extraordinarily useful in Matrix Calculus
\begin{align}
A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\
A:A &= \|A\|^2_F \qquad \bigl\{\text{\textcolor{red}{Frobenius} norm}\bigr\}
\end{align}
The properties of the underlying trace function allow the terms in such a
product to be rearranged in many equivalent ways, e.g. 
\begin{align}
A:B &= B:A \\
A:B &= A^T:B^T \\
C:(AB) &= (CB^T):A \;=\; (A^TC):B 
\end{align}

\end{document}

相关内容