我想在算法中使用方程式。为此,我使用 algorithm2e 和 align* 环境。结果看起来不错,但遗憾的是,align* 占用了太多垂直空间,因此整个算法变得比页面本身还大。MWE:
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[boxruled,vlined,linesnumbered,algo2e, longend]{algorithm2e}
\usepackage{amsmath}
\title{stackquestion}
\date{June 2021}
\begin{document}
\maketitle
\section{Introduction}
\DontPrintSemicolon
\begin{algorithm2e}[htb]
\SetAlgoLined
\KwIn{Number of episodes $N$, discount factor $\gamma$}
\KwOut{Deterministic policy $\pi(\cdot |\theta^\pi)$, action-value function approximation $Q(\cdot, \cdot|\theta^{Q_1}),Q(\cdot, \cdot|\theta^{Q_2}) \approx q_*$}
Initialize replay memory $\mathcal{M}$.
\\ Initialize randomly neural network parameters $\theta^{Q_1}, \theta^{Q_2}$ for the critics and $\theta^\pi$ for the actor.
\\ Initialize target networks (for actor and critics): $\hat{\theta}^{Q_1} \leftarrow \theta^{Q_1},\hat{\theta}^{Q_2} \leftarrow \theta^{Q_2},\hat{\theta}^\pi\leftarrow \theta^\pi$
randomly.
\For{ episode$=1, \hdots, N$}{
Initialize starting state of episode $S_0$\;
\For{step $t$ in episode}{
Choose action $A_t = \pi(S_t|\theta^\pi)+\epsilon$ with $\epsilon \sim \mathcal{N}(0, \sigma)$\;
Take action $A_t$, observe $R_{t+1}, S_{t+1}$\;
Store transition $(S_t, A_t, R_{t+1}, S_{t+1})$ in $\mathcal{M}$ \;
% $S \leftarrow S'$\;
Sample random minibatch of $m$ transitions ($S_j, A_j, R_j, S_{j+1})$ from $\mathcal{M}$\;
Set\; \begin{align*}
\tilde{A}_{j+1} &\leftarrow \pi(S_{j+1}|\hat{\theta}^\pi)+\epsilon, \quad \epsilon \sim \text{clip}(\mathcal{N}(0,\tilde{\sigma}), -c, c)
\\ y_j &\leftarrow R_{j+1}+\gamma \min_{i=1,2}Q(S_{j+1}, \tilde{A}_{j+1}|\hat{\theta}^{Q_i})
\end{align*}\;
Update critics \;
\begin{align*}
\theta^{Q_i} &\leftarrow \arg \min_{\theta^{Q_i}} \frac{1}{m}\sum_{j=1}^m(y_j-Q(S_j, A_j|\theta^{Q_i}))^2
\end{align*}\;
\If{$t$ mod $d$}{
Update $\theta^\pi$ by the deterministic policy gradient:\;
\begin{align*}
\nabla_{\theta^\pi}J(\theta^\pi) &= \frac{1}{m}\sum_{j=1}^m \nabla_a Q(S_j,a|\theta^{Q_1})|_{a=\pi(S_j|\theta^\pi)}\nabla_{\theta^\pi}\pi(S_j|\theta^\pi)
\end{align*}\;
Update target networks: \;
\begin{align*}
\hat{\theta}^{Q_i} & \leftarrow \tau \theta^{Q_i}+(1-\tau)\hat{\theta}^{Q_i} \quad i=1,2
\\ \hat{\theta}^\pi &\leftarrow \tau \theta^\pi + (1-\tau)\hat{\theta}^{\pi}
\end{align*}\;
}
Until episode ended or termination was enforced
}
}
\Return{$\pi \leftarrow \pi(\cdot |\theta^\pi)$}
\caption{Twin Delayed Deep deterministic policy gradient (T3DPG)}
\label{alg:T3DPG}
\end{algorithm2e}
\end{document}
结果如下图所示。我想删除用红色突出显示的不必要的垂直间距。如果可能的话,我仍然想使用 align* 环境,尤其是为了轻松对齐多个方程式。
答案1
我将align*
用 替换环境$\begin{aligned} … \end{aligned}$
。
请注意,一些 align* 环境是不必要的,因此简单的环境$ … $
对它们来说就足够了。
不相关:inputenc
使用选项加载utf8
现在没有用,因为这是 latex 默认期望的。
\documentclass{article}
\usepackage[boxruled,vlined,linesnumbered,algo2e, longend]{algorithm2e}
\usepackage{amsmath}
\title{stackquestion}
%\author{massimo.joerin }
\date{June 2021}
\begin{document}
\maketitle
\section{Introduction}
\DontPrintSemicolon
\begin{algorithm2e}[htb]
\SetAlgoLined
\KwIn{Number of episodes $N$, discount factor $\gamma$}
\KwOut{Deterministic policy $\pi(\cdot |\theta^\pi)$, action-value function approximation $Q(\cdot, \cdot|\theta^{Q_1}),Q(\cdot, \cdot|\theta^{Q_2}) \approx q_*$}
Initialize replay memory $\mathcal{M}$.
\\ Initialize randomly neural network parameters $\theta^{Q_1}, \theta^{Q_2}$ for the critics and $\theta^\pi$ for the actor.
\\ Initialize target networks (for actor and critics): $\hat{\theta}^{Q_1} \leftarrow \theta^{Q_1},\hat{\theta}^{Q_2} \leftarrow \theta^{Q_2},\hat{\theta}^\pi\leftarrow \theta^\pi$
randomly.
\For{ episode$=1, \hdots, N$}{
Initialize starting state of episode $S_0$\;
\For{step $t$ in episode}{
Choose action $A_t = \pi(S_t|\theta^\pi)+\epsilon$ with $\epsilon \sim \mathcal{N}(0, \sigma)$\;
Take action $A_t$, observe $R_{t+1}, S_{t+1}$\;
Store transition $(S_t, A_t, R_{t+1}, S_{t+1})$ in $\mathcal{M}$ \;
% $S \leftarrow S'$\;
Sample random minibatch of $m$ transitions ($S_j, A_j, R_j, S_{j+1})$ from $\mathcal{M}$\;
Set\;$ \begin{aligned}
\tilde{A}_{j+1} &\leftarrow \pi(S_{j+1}|\hat{\theta}^\pi)+\epsilon, \quad \epsilon \sim \text{clip}(\mathcal{N}(0,\tilde{\sigma}), -c, c)
\\ y_j &\leftarrow R_{j+1}+\gamma \min_{i=1,2}Q(S_{j+1}, \tilde{A}_{j+1}|\hat{\theta}^{Q_i})
\end{aligned}$\;
Update critics \;
$ \begin{aligned}
\theta^{Q_i} &\leftarrow \arg \min_{\theta^{Q_i}} \frac{1}{m}\sum_{j=1}^m(y_j-Q(S_j, A_j|\theta^{Q_i}))^2
\end{aligned} $\;
\If{$t$ mod $d$}{
Update $\theta^\pi$ by the deterministic policy gradient:\;
$ \begin{aligned}
\nabla_{\theta^\pi}J(\theta^\pi) &= \frac{1}{m}\sum_{j=1}^m \nabla_a Q(S_j,a|\theta^{Q_1})|_{a=\pi(S_j|\theta^\pi)}\nabla_{\theta^\pi}\pi(S_j|\theta^\pi)
\end{aligned} $\;
Update target networks: \;
$ \begin{aligned}
\hat{\theta}^{Q_i} & \leftarrow \tau \theta^{Q_i}+(1-\tau)\hat{\theta}^{Q_i} \quad i=1,2
\\ \hat{\theta}^\pi &\leftarrow \tau \theta^\pi + (1-\tau)\hat{\theta}^{\pi}
\end{aligned} $\;
}
Until episode ended or termination was enforced
}
}
\Return{$\pi \leftarrow \pi(\cdot |\theta^\pi)$}
\caption{Twin Delayed Deep deterministic policy gradient (T3DPG)}
\label{alg:T3DPG}
\end{algorithm2e}
\end{document}