答案1
一个丑陋但有效的尝试,使用amsmath
、mathtools
和tikz
:
\documentclass{article}
\usepackage{amsmath}
\usepackage{mathtools}
\usepackage{tikz}
\usetikzlibrary{tikzmark,calc,positioning}
\tikzset{every picture/.style={remember picture}}
\begin{document}
$\underbrace{\text{New }Q(s, a)}_{\text{New }Q\text{-value}} = \underbrace{Q(s, a)}_{\mathclap{\substack{\text{Current } \\ Q\text{-value}}}} +\tikz[baseline,inner sep=0pt,anchor = base]{\node (alpha) {$\alpha$}} [\underbrace{R(s, a)}_{\text{Reward}} + \tikz[baseline,inner sep=0pt,anchor = base]{\node (gamma) {$\gamma$}} \underbrace{\max Q^{\prime}\left(s^{\prime}, a^{\prime}\right)}_{\mathrlap{\substack{\text{Maximum predicted} \\ \text{reward given new state} \\ \text{and all possible actions} }}} -Q(s, a) ]$
\begin{tikzpicture}[overlay]
\node [below=1.5cm of alpha] (A) {\scriptsize Learning rate};
\node [below=of gamma] (C) {\scriptsize Discount rate};
\draw (alpha) -- (A);
\draw (gamma) -- (C);
\end{tikzpicture}
\end{document}
答案2
丑陋,但有效。
\documentclass{article}
\usepackage{amsmath}
\newcommand{\tubrace}[2]{{\underbrace{#1}_{#2}}}% the additional braces are necessary
\newcommand{\tudesc}[3]{% #1=symbol, #2=height of line, #3=text
\underset{\underset{\scriptstyle\text{\makebox[0pt]{#3}}}{\rule{0.4pt}{#2}}}{#1}%
}
\begin{document}
\[
\newcommand{\foo}[1]{\text{\makebox[3em][l]{#1\vphantom{ly}}}}% temporary macro
\tubrace{\mathrm{New}Q(s,a)}{\text{New $Q$-value}}=
\tubrace{Q(s,a)}{\substack{\text{Current\vphantom{y}} \\ \text{$Q$-value}}}
+
\tudesc{\alpha}{1.5cm}{Learning rate}
[\,
\tubrace{R(s,a)}{\text{Reward}}+
\tudesc{\gamma}{1cm}{Discount rate}
\,
\tubrace{{\max}Q'(s',a')}{%
\substack{\foo{Maximum predicted reward given}\\\foo{new state and all possible actions}}%
}
-Q(s,a)
]
\]
\end{document}
答案3
还有另一种变体,使用explanation
宏,以垂直规则的长度作为可选参数,必要时进行调整以避免解释重叠:
\documentclass{article}
\usepackage{mathtools}
\newcommand{\explanation}[3][3ex]{\underset{\begin{tabular}{@{}c@{}}\rule{0.6pt}{#1}\\[-1.2ex] \clap{\sffamily\scriptsize#3}\end{tabular}}{#2}}
\begin{document}
\[ \underbrace{\mathrm{New\,}Q(s, \alpha)}_{\textsf{New $ Q $-value}} = \underbrace{Q(s, \alpha)}_{\substack{\textsf{Current}\\ Q\textsf{-value}}} + \explanation[5ex]{α}{learning rate}\underbrace{R(s, α)}_{\textsf{Reward}} +\explanation{γ}{discount rate} \underbrace{ \max Q'(s', α')}_{\hskip-2em\mathrlap{\substack{\textsf{Maximum predicted reward, given}\\ \textsf{new state and all possible actions}}}}-Q(s, \alpha) ]\]%
\end{document}