我相对较新,很难将不同的软件包组合在一起。我试图将一个图形包装在我的文本中,并且我还希望标题与文本的侧面对齐。我目前的方法如下所示,而红色箭头显示了我想要改进的地方:
为了将图像包装在文本中,我使用了wrapfig
如下包:
\begin{wrapfigure}{R}{0.5\textwidth}
\vspace{-25pt}
\centering
\includegraphics[scale=0.41]{figures/transformer3.png}
\caption[Illustration of multiheaded attention]{Illustration of multiheaded attention. The two highlighted attention heads have learned to associate \textit{"it"} with different parts of the sentence.}
\label{fig:transformer3}
\end{wrapfigure}
\noindent The projections are parameter matrices $\bm{W}_{i}^{Q} \in \mathbb{R}^{d \times d_q}$, $\bm{W}_{i}^{K} \in \mathbb{R}^{d \times d_k}$, $\bm{W}_{i}^{V} \in \mathbb{R}^{d \times d_v}$ and $\bm{W}^{O} \in \mathbb{R}^{hd_v \times d}$. By applying multiple attention heads, the model is allowed to jointly attend to information at different positions within the input sequence. In figure \ref{fig:transformer3} for example, the orange attention head associates \textit{“it”} with \textit{“The animal”}, while the green attention head has learned an association to “tired”.
\subsubsection*{Outlook on the Empirical Studies}
While the U-Net and the stacked hourglass are already well established architectures in the CV domain, Transformers have been mainly applied on NLP problems so far. However, there is a strong belief within the deep learning community that Transformers may represent a suitable architecture for CV tasks as well. For this reason, the empirical study will investigate on recent approaches to apply self-attention based networks on images. The concepts will then be implemented in a neural network that will be trained on a CV task. Finally, the performance will be evaluated against models that instead rely on the U-Net and the stacked hourglass.
对于侧边字幕,我读到该包floatrow
应该很有用。但是,当我尝试将两者结合起来时,我得到了编译错误。我还找到了一个介绍性的用法这里。再次,我可以重现这一点,但在这种情况下,我很难将其与我的文本正确对齐。有人能帮我吗?非常感谢!
答案1
相同的基础设置很脆弱。我宁愿坚持使用其中两个,一个用于中断文本的浮点数,另一个用于wrapfig
文本中的浮点数:
\documentclass{article}
\usepackage{amssymb, bm}
\usepackage[export]{adjustbox}
\usepackage{wrapfig}
\usepackage[outercaption]{sidecap}
\makeatletter
\def\SC@figure@vpos{m}
\makeatother
\usepackage{tabularx}
\usepackage[font={small, sf},labelfont=bf]{caption}
\begin{document}
\noindent The projections are parameter matrices $\bm{W}_{i}^{Q} \in \mathbb{R}^{d \times d_q}$, $\bm{W}_{i}^{K} \in \mathbb{R}^{d \times d_k}$, $\bm{W}_{i}^{V} \in \mathbb{R}^{d \times d_v}$ and $\bm{W}^{O} \in \mathbb{R}^{hd_v \times d}$. By applying multiple attention heads, the model is allowed to jointly attend to information at different positions within the input sequence.
\begin{SCfigure}[50][ht]
\centering
\includegraphics[scale=0.41]{example-image-duck}%{figures/transformer3.png}
\caption[Illustration of multiheaded attention]
{Illustration of multiheaded attention. The two highlighted attention heads have learned to associate \textit{"it"} with different parts of the sentence.}
\label{fig:transformer3}
\end{SCfigure}
In figure \ref{fig:transformer3} for example, the orange attention head associates \textit{“it”} with \textit{“The animal”}, while the green attention head has learned an association to “tired”.
\subsubsection*{Outlook on the Empirical Studies}
\begin{wrapfigure}[5]{R}{0.65\textwidth}
\vspace{-1.75\baselineskip}
\begin{tabularx}{\linewidth}{@{} cX @{}}
\includegraphics[scale=0.41,valign=T]{example-image-duck}%{figures/transformer3.png}
&
\caption[Illustration of multiheaded attention]
{Illustration of multiheaded attention. The two highlighted attention heads have learned to associate \textit{"it"} with different parts of the sentence.}
\label{fig:transformer3}
\end{tabularx}
\end{wrapfigure}
While the U-Net and the stacked hourglass are already well established architectures in the CV domain, Transformers have been mainly applied on NLP problems so far. However, there is a strong belief within the deep learning community that Transformers may represent a suitable architecture for CV tasks as well. For this reason, the empirical study will investigate on recent approaches to apply self-attention based networks on images. The concepts will then be implemented in a neural network that will be trained on a CV task. Finally, the performance will be evaluated against models that instead rely on the U-Net and the stacked hourglass.
\end{document}
答案2
这显示了如何使用paracol
。唯一的问题是您必须使用\splitpar
和手动拆分段落\continuepar
。另一方面,paracol
它比 更强大wrapfig
。
\documentclass{article}
\usepackage{amssymb, bm}
\usepackage[export]{adjustbox}
\usepackage{paracol}
\usepackage[font={small, sf},labelfont=bf]{caption}
\newsavebox{\textbox}
\newcommand{\splitpar}[2][\textwidth]{% #1 = width of column (optional), #2 = rest of paragraph after split
\unskip\strut{\parfillskip=0pt\parskip=0pt\par}%
\global\setbox\textbox=\vbox{\hsize=#1\relax\noindent\strut #2\strut}}
\newcommand{\continuepar}{\unvbox\textbox}
\begin{document}
\setcolumnwidth{\dimexpr 0.5\textwidth-\columnsep}% second column uses remainder
\begin{paracol}{2}
\sloppy% SOP for narrow columns
\noindent The projections are parameter matrices $\bm{W}_{i}^{Q} \in \mathbb{R}^{d \times d_q}$, $\bm{W}_{i}^{K} \in \mathbb{R}^{d \times d_k}$, $\bm{W}_{i}^{V} \in \mathbb{R}^{d \times d_v}$ and $\bm{W}^{O} \in \mathbb{R}^{hd_v \times d}$. By applying multiple attention heads, the model is allowed to jointly attend to information at different positions within the input sequence.
In figure \ref{fig:transformer3} for example, the orange attention head associates \textit{“it”} with \textit{“The animal”}, while the green attention head has learned an association to “tired”.
\switchcolumn
\begin{figure}[h!]
\includegraphics[width=\linewidth, height=4in]{example-image}
\end{figure}
\switchcolumn
\begin{figure}[h]
\caption[Illustration of multiheaded attention]
{Illustration of multiheaded attention. The two highlighted attention heads have learned to associate \textit{"it"} with different parts of the sentence.}
\label{fig:transformer3}
\end{figure}
\subsubsection*{Outlook on the Empirical Studies}
While the U-Net and the stacked hourglass are already well established architectures in the CV domain,
Transformers have been mainly appl-\splitpar{ied on NLP problems so far. However, there is a strong belief within the deep learning community that Transformers may represent a suitable architecture for CV tasks as well. For this reason, the empirical study will investigate on recent approaches to apply self-attention based networks on images. The concepts will then be implemented in a neural network that will be trained on a CV task. Finally, the performance will be evaluated against models that instead rely on the U-Net and the stacked hourglass.}
\end{paracol}
\continuepar
\end{document}