algorithm2e 对齐环境占用太多垂直空间

Question

我将align*用替换环境 $\begin{aligned} … \end{aligned}$ 。

请注意，一些 align* 环境是不必要的，因此简单的环境 $ … $ 对它们来说就足够了。

不相关：inputenc使用选项加载utf8现在没有用，因为这是 latex 默认期望的。

\documentclass{article}
\usepackage[boxruled,vlined,linesnumbered,algo2e, longend]{algorithm2e}
\usepackage{amsmath}

\title{stackquestion}
%\author{massimo.joerin }
\date{June 2021}

\begin{document}

\maketitle

\section{Introduction}
\DontPrintSemicolon
\begin{algorithm2e}[htb]
  \SetAlgoLined
  \KwIn{Number of episodes $N$, discount factor $\gamma$}
  \KwOut{Deterministic policy $\pi(\cdot |\theta^\pi)$, action-value function approximation $Q(\cdot, \cdot|\theta^{Q_1}),Q(\cdot, \cdot|\theta^{Q_2}) \approx q_*$}
  Initialize replay memory $\mathcal{M}$.
  \\ Initialize randomly neural network parameters $\theta^{Q_1}, \theta^{Q_2}$ for the critics and $\theta^\pi$ for the actor.
  \\ Initialize target networks (for actor and critics): $\hat{\theta}^{Q_1} \leftarrow \theta^{Q_1},\hat{\theta}^{Q_2} \leftarrow \theta^{Q_2},\hat{\theta}^\pi\leftarrow \theta^\pi$
 randomly.
  \For{ episode$=1, \hdots, N$}{
  Initialize starting state of episode $S_0$\;
  \For{step $t$ in episode}{
   Choose action $A_t = \pi(S_t|\theta^\pi)+\epsilon$ with $\epsilon \sim \mathcal{N}(0, \sigma)$\;
   Take action $A_t$, observe $R_{t+1}, S_{t+1}$\;
  Store transition $(S_t, A_t, R_{t+1}, S_{t+1})$ in $\mathcal{M}$ \;
% $S \leftarrow S'$\;
  Sample random minibatch of $m$ transitions ($S_j, A_j, R_j, S_{j+1})$ from $\mathcal{M}$\;
   Set\;$ \begin{aligned}
   \tilde{A}_{j+1} &\leftarrow \pi(S_{j+1}|\hat{\theta}^\pi)+\epsilon, \quad \epsilon \sim \text{clip}(\mathcal{N}(0,\tilde{\sigma}), -c, c)
   \\ y_j &\leftarrow R_{j+1}+\gamma \min_{i=1,2}Q(S_{j+1}, \tilde{A}_{j+1}|\hat{\theta}^{Q_i})
   \end{aligned}$\;
   Update critics \;
  $ \begin{aligned}
   \theta^{Q_i} &\leftarrow \arg \min_{\theta^{Q_i}} \frac{1}{m}\sum_{j=1}^m(y_j-Q(S_j, A_j|\theta^{Q_i}))^2
   \end{aligned} $\;
   \If{$t$ mod $d$}{
        Update $\theta^\pi$ by the deterministic policy gradient:\;
        $ \begin{aligned}
        \nabla_{\theta^\pi}J(\theta^\pi) &= \frac{1}{m}\sum_{j=1}^m \nabla_a Q(S_j,a|\theta^{Q_1})|_{a=\pi(S_j|\theta^\pi)}\nabla_{\theta^\pi}\pi(S_j|\theta^\pi)
        \end{aligned} $\;
        Update target networks: \;
        $ \begin{aligned}
        \hat{\theta}^{Q_i} & \leftarrow \tau \theta^{Q_i}+(1-\tau)\hat{\theta}^{Q_i} \quad i=1,2
        \\ \hat{\theta}^\pi &\leftarrow \tau \theta^\pi + (1-\tau)\hat{\theta}^{\pi}
        \end{aligned} $\;
   }
  Until episode ended or termination was enforced
}
}
  \Return{$\pi \leftarrow \pi(\cdot |\theta^\pi)$}
  \caption{Twin Delayed Deep deterministic policy gradient (T3DPG)}
  \label{alg:T3DPG}
\end{algorithm2e}

\end{document}

Answer 1

我将align*用替换环境 $\begin{aligned} … \end{aligned}$ 。

请注意，一些 align* 环境是不必要的，因此简单的环境 $ … $ 对它们来说就足够了。

不相关：inputenc使用选项加载utf8现在没有用，因为这是 latex 默认期望的。

\documentclass{article}
\usepackage[boxruled,vlined,linesnumbered,algo2e, longend]{algorithm2e}
\usepackage{amsmath}

\title{stackquestion}
%\author{massimo.joerin }
\date{June 2021}

\begin{document}

\maketitle

\section{Introduction}
\DontPrintSemicolon
\begin{algorithm2e}[htb]
  \SetAlgoLined
  \KwIn{Number of episodes $N$, discount factor $\gamma$}
  \KwOut{Deterministic policy $\pi(\cdot |\theta^\pi)$, action-value function approximation $Q(\cdot, \cdot|\theta^{Q_1}),Q(\cdot, \cdot|\theta^{Q_2}) \approx q_*$}
  Initialize replay memory $\mathcal{M}$.
  \\ Initialize randomly neural network parameters $\theta^{Q_1}, \theta^{Q_2}$ for the critics and $\theta^\pi$ for the actor.
  \\ Initialize target networks (for actor and critics): $\hat{\theta}^{Q_1} \leftarrow \theta^{Q_1},\hat{\theta}^{Q_2} \leftarrow \theta^{Q_2},\hat{\theta}^\pi\leftarrow \theta^\pi$
 randomly.
  \For{ episode$=1, \hdots, N$}{
  Initialize starting state of episode $S_0$\;
  \For{step $t$ in episode}{
   Choose action $A_t = \pi(S_t|\theta^\pi)+\epsilon$ with $\epsilon \sim \mathcal{N}(0, \sigma)$\;
   Take action $A_t$, observe $R_{t+1}, S_{t+1}$\;
  Store transition $(S_t, A_t, R_{t+1}, S_{t+1})$ in $\mathcal{M}$ \;
% $S \leftarrow S'$\;
  Sample random minibatch of $m$ transitions ($S_j, A_j, R_j, S_{j+1})$ from $\mathcal{M}$\;
   Set\;$ \begin{aligned}
   \tilde{A}_{j+1} &\leftarrow \pi(S_{j+1}|\hat{\theta}^\pi)+\epsilon, \quad \epsilon \sim \text{clip}(\mathcal{N}(0,\tilde{\sigma}), -c, c)
   \\ y_j &\leftarrow R_{j+1}+\gamma \min_{i=1,2}Q(S_{j+1}, \tilde{A}_{j+1}|\hat{\theta}^{Q_i})
   \end{aligned}$\;
   Update critics \;
  $ \begin{aligned}
   \theta^{Q_i} &\leftarrow \arg \min_{\theta^{Q_i}} \frac{1}{m}\sum_{j=1}^m(y_j-Q(S_j, A_j|\theta^{Q_i}))^2
   \end{aligned} $\;
   \If{$t$ mod $d$}{
        Update $\theta^\pi$ by the deterministic policy gradient:\;
        $ \begin{aligned}
        \nabla_{\theta^\pi}J(\theta^\pi) &= \frac{1}{m}\sum_{j=1}^m \nabla_a Q(S_j,a|\theta^{Q_1})|_{a=\pi(S_j|\theta^\pi)}\nabla_{\theta^\pi}\pi(S_j|\theta^\pi)
        \end{aligned} $\;
        Update target networks: \;
        $ \begin{aligned}
        \hat{\theta}^{Q_i} & \leftarrow \tau \theta^{Q_i}+(1-\tau)\hat{\theta}^{Q_i} \quad i=1,2
        \\ \hat{\theta}^\pi &\leftarrow \tau \theta^\pi + (1-\tau)\hat{\theta}^{\pi}
        \end{aligned} $\;
   }
  Until episode ended or termination was enforced
}
}
  \Return{$\pi \leftarrow \pi(\cdot |\theta^\pi)$}
  \caption{Twin Delayed Deep deterministic policy gradient (T3DPG)}
  \label{alg:T3DPG}
\end{algorithm2e}

\end{document}

algorithm2e 对齐环境占用太多垂直空间

答案1

相关内容