有没有什么好的用于词对齐的软件包?

有没有什么好的用于词对齐的软件包?

我想要可视化句子对之间的词对齐,其中对齐单词的位置在句子之间可能有所不同,并且一个句子中的一个单词可能与另一个句子中的多个单词对齐,甚至根本没有匹配的单词。

一张图片胜过千言万语:

在此处输入图片描述

有没有适合这项工作的软件包?如果没有专门的软件包,有什么好的替代方案吗?

答案1

\documentclass{article}
\usepackage{pst-node}
\def\W#1#2{\rnode{#1}{#2}\hfill}
\begin{document}

\W{a}{The} \W{b}{proposal} \W{c}{will} \W{d}{not} \W{e}{now} \W{f}{be} \W{g}{implemented}

\vspace{4cm}
\W{A}{Les} \W{B}{propositions} \W{C}{nes} \W{D}{seront} \W{E}{pas} \W{F}{mises} \W{G}{en}
 \W{H}{application} \W{I}{maintenant}
\psset{nodesep=5pt}
\ncline{a}{A}\ncline{b}{B}
\ncline{c}{D}
\ncdiag[border=4pt,angleA=-90,angleB=90,arm=5mm]{d}{C}
\ncput[npos=1.5]{\rnode{dC}{}}
\ncdiag[nodesepA=0pt,armB=5mm,angleA=-55,angleB=90]{dC}{E}
\ncdiag[angleA=-90,angleB=90,arm=5mm]{e}{I}
\ncdiag[border=4pt,angleA=-90,angleB=90,arm=5mm]{g}{F}
\ncput[npos=1.8]{\rnode{gF}{}}
\ncdiag[nodesepA=0pt,armB=5mm,angleA=-75,angleB=90]{gF}{G}
\ncdiag[nodesepA=0pt,armB=5mm,angleA=-20,angleB=90]{gF}{H}
\ncdiagg[angleA=-90,angleB=90,nodesepB=7mm]{f}{f}

\end{document}

在此处输入图片描述

答案2

好的,这是一个快速而简单的 TiKZ 解决方案。我还没有连接在图中,虽然我认为它实际上应该与塞龙特。如果您想连接它们,您可以在矩阵p3下方创建一个节点,并将和连接到该节点,然后将该节点连接到。nowwillbeseront

更新我调整了一些参数以使单词之间的间距更好。

\documentclass{article}
\usepackage[margin=1in]{geometry}
\usepackage{tikz}
\usetikzlibrary{intersections,positioning}
\newcommand*{\hnode}[1]{\node[outer sep=0pt,anchor=base] (#1) {#1};} % create a labelled node
\begin{document}

\begin{tikzpicture}
% First make a matrix containing as many columns as the longest sentence.
% Each cell contains a node whose label is identical to the word itself
\matrix[column sep=0em,row sep=.4in] {
% First sentence
\hnode{The} & \hnode{proposal} & \hnode{will} & \hnode{not} & \hnode{now} & \hnode{be} &  \hnode{implemented}  &\\
% Now create some dummy nodes to make intermediate nodes
& & & \node[inner sep={0pt},minimum width=0pt] (p1) {}; & & & \node[inner sep={0pt},minimum width=0pt] (p2) {};\\
% Second sentence
\hnode{Les} & \hnode{propositions} & \hnode{ne}  & \hnode{seront} & \hnode{pas} & \hnode{mises} & \hnode{en} & \hnode{application} & \hnode{maintenant}\\
};
% Now connect the nodes.  For paths that we want to break, name the path
\draw (The) -- (Les);
\draw (proposal) -- (propositions);
\draw[name path=willP] (will) -- (seront.north);
\draw (not) -- (p1);
\path[name path=neP] (p1) -- (ne);
\draw (p1) -- (pas);
\draw[name path=nowP] (now.south)  -- (maintenant.north);
\path[name path=impP] (implemented) -- (p2);
\draw (p2) -- (mises.north);
\draw (p2) -- (en);
\draw (p2) -- (application.north);
% Now break the paths at the intersection by drawing a white circle over it
\fill[white, name intersections={of=willP and neP}] (intersection-1) circle (4pt);
\fill[white, name intersections={of=nowP and impP}] (intersection-1) circle (4pt);
% Finally redraw the path you don't want broken
% Is there a more elegant way to do this?
\draw (p1) -- (ne);
\draw (implemented) -- (p2);
\end{tikzpicture}

\end{document} 

代码输出

答案3

这是基于艾伦的回答,并使其与我的回答一起工作TikZ 是否有与 PSTricks \ncdiag 命令等效的命令?

代码如下:

\documentclass{article}
%\url{https://tex.stackexchange.com/q/25474/86}
\usepackage{tikz}
\usetikzlibrary{calc,matrix}
\newcommand{\hnode}[1]{|(#1)| #1}

\tikzset{
  arm angleA/.initial={0},
  arm angleB/.initial={0},
  arm lengthA/.initial={0mm},
  arm lengthB/.initial={0mm},
  arm length/.style={%
    arm lengthA=#1,
    arm lengthB=#1,
  },
  arm/.style={
    to path={%
      (\tikztostart) -- ++(\pgfkeysvalueof{/tikz/arm angleA}:\pgfkeysvalueof{/tikz/arm lengthA}) -- ($(\tikztotarget)+(\pgfkeysvalueof{/tikz/arm angleB}:\pgfkeysvalueof{/tikz/arm lengthB})$) -- (\tikztotarget)
    }
  },
}

\begin{document}

\begin{tikzpicture}
\matrix[column sep=0em,row sep=.4in,matrix of nodes,row 2/.style={coordinate}] (m) {
% First sentence
\hnode{The} & \hnode{proposal} & \hnode{will} & \hnode{not} & \hnode{now} & \hnode{be} &  \hnode{implemented}  &\\
% Now create some dummy nodes to make intermediate nodes
& & &|(p1)| {} & & &|(p2)| {}\\
% Second sentence
\hnode{Les} & \hnode{propositions} & \hnode{ne}  & \hnode{seront} & \hnode{pas} & \hnode{mises} & \hnode{en} & \hnode{application} & \hnode{maintenant}\\
};
% Now connect the nodes.
\begin{scope}[every path/.style={line width=4pt,white,double=black},every to/.style={arm}, arm angleA=-90, arm angleB=90, arm length=5mm]
\draw (The) to (Les);
\draw (proposal) to (propositions);
\draw (will) to (seront);
\draw (not) to[arm lengthB=0pt] (p1)
 (p1) to[arm lengthA=0pt] (ne)
 (p1) to[arm lengthA=0pt] (pas);
\draw (now) to (maintenant);
\draw (be) -- ++(0,-.2in);
\draw (implemented) to[arm lengthB=0pt] (p2)
 (p2) to[arm lengthA=0pt] (mises)
 (p2) to[arm lengthA=0pt] (en)
 (p2) to[arm lengthA=0pt] (application);
\end{scope}
\end{tikzpicture}

\end{document} 

结果如下:

单词链接

除此之外arm,我们还利用绘制两条线的功能来处理交叉点,double两条线的粗细各有不同。通过将外线设为白色,内线设为黑色,我们可以“切”出下面的线条。

(图片被裁剪得有点过了……“维护”(在真实的照片中。)

更新:我对线条长度不一这一事实不太满意,因此我修改了matrix节点样式(很多),这样行就会扩展到一定长度。我们测量法语句子的长度,然后告诉英语句子扩展到相同的长度。我想这看起来会好看一点;尽管代价可能有点高!

这次,我将从图片开始:

扩张臂

现在代码是:

\documentclass{standalone}
%\url{https://tex.stackexchange.com/q/25474/86}
\usepackage[scale=.95]{geometry}
\usepackage{tikz}
\usetikzlibrary{calc,matrix}
\newcommand{\hnode}[1]{|(#1)| #1}

\makeatletter
\tikzset{
  arm angleA/.initial={0},
  arm angleB/.initial={0},
  arm lengthA/.initial={0mm},
  arm lengthB/.initial={0mm},
  arm length/.style={%
    arm lengthA=#1,
    arm lengthB=#1,
  },
  arm/.style={
    to path={%
      (\tikztostart) -- ++(\pgfkeysvalueof{/tikz/arm angleA}:\pgfkeysvalueof{/tikz/arm lengthA}) -- ($(\tikztotarget)+(\pgfkeysvalueof{/tikz/arm angleB}:\pgfkeysvalueof{/tikz/arm lengthB})$) -- (\tikztotarget)
    }
  },
  expand/.code={%
    \let\pgf@matrix@compute@origin=\pgf@matrix@compute@origin@expand
    \let\pgf@matrix@cont=\pgf@matrix@cont@expand%
    \let\pgf@matrix@cell@cont=\pgf@matrix@cell@cont@expand
  },
  expand width/.initial={100pt}, 
}


\def\ex@minwidth{100pt}%
\let\pgf@matrix@compute@origin@orig=\pgf@matrix@compute@origin
\def\pgf@matrix@compute@origin@expand{%
  \pgf@matrix@compute@origin@orig
  \pgfmathsetmacro{\ex@width}{%
    \csname pgf@matrix@minx\the\pgf@matrix@numberofcolumns\endcsname -
    \csname pgf@matrix@minx1\endcsname +
    \csname pgf@matrix@maxx\the\pgf@matrix@numberofcolumns\endcsname +
    \csname pgf@matrix@maxx1\endcsname +
2*\pgfkeysvalueof{/pgf/inner xsep}
  }
  \pgfmathsetmacro{\ex@extra}{max(0,(\pgfkeysvalueof{/tikz/expand width} - \ex@width)/(\pgf@matrix@numberofcolumns - 1))}%
  {%
    \c@pgf@counta=1\relax%
    \advance\pgf@matrix@numberofcolumns by 1\relax
    \loop%
    \ifnum\c@pgf@counta<\pgf@matrix@numberofcolumns\relax%
    \pgfmathparse{\csname pgf@matrix@minx\the\c@pgf@counta\endcsname + (\c@pgf@counta - 1) * \ex@extra}%
      \expandafter\xdef\csname pgf@matrix@minx\the\c@pgf@counta\endcsname{\pgfmathresult pt}%
      \advance\c@pgf@counta by1\relax%
    \repeat%
  }%
}
\def\pgf@matrix@cont@expand{%  
    \setbox\pgf@matrix@box=\hbox\bgroup\vbox\bgroup%
  \pgfmathparse{\pgfkeysvalueof{/tikz/expand width} - 2*\pgfkeysvalueof{/pgf/inner xsep}}%
    \halign to \pgfmathresult pt\bgroup%
    \pgf@matrix@init@row%
    \pgf@matrix@step@column%
    {%
      \pgf@matrix@startcell%
      ##%
      \pgf@matrix@endcell%
    }%
    \tabskip=0pt\relax
    &%
    ##\pgf@matrix@padding&&%
    ##%
    \tabskip=0pt plus 1fil\relax
    &%
    \pgf@matrix@step@column%
    {%
      \pgf@matrix@startcell%
      ##%
      \pgf@matrix@endcell%
    }%
    \tabskip=0pt\relax
    &%
    ##\pgf@matrix@padding%
    \cr%
}

\def\pgf@matrix@cell@cont@expand[#1]{%
  \ifnum\pgfmatrixcurrentcolumn<\pgf@matrix@numberofcolumns%
  \else%
  {%
    \global\pgf@matrix@fixedfalse%
    \pgf@x=0pt%
    \pgf@matrix@addtolength{\pgf@x}{\pgfmatrixcolumnsep}%
    \pgf@matrix@addtolength{\pgf@x}{#1}%
    \ifpgf@matrix@fixed%
      \expandafter\pgfutil@g@addto@macro\csname pgf@matrix@column@finish@\the\pg
fmatrixcurrentcolumn\endcsname%
        {\global\pgf@picmaxx=0pt}%
    \fi%
    \advance\pgfmatrixcurrentcolumn by1\relax % only temporary for the following:
    \expandafter\xdef\csname pgf@matrix@column@sep@\the\pgfmatrixcurrentcolumn\endcsname{\the\pgf@x}%
    \ifpgf@matrix@fixed%
      \expandafter\gdef\csname pgf@matrix@column@finish@\the\pgfmatrixcurrentcolumn\endcsname{\global\pgf@picminx=0pt}%
    \else%
      \expandafter\global\expandafter\let\csname pgf@matrix@column@finish@\the\pgfmatrixcurrentcolumn\endcsname=\pgfutil@empty%
    \fi%
  }%
  \fi%
  &\pgf@matrix@correct@calltrue&\pgf@matrix@correct@calltrue&%
}%

\makeatother

\begin{document}

\begin{tikzpicture}
% Second sentence
\matrix[column sep=0em,matrix of nodes] (French) {
\hnode{Les} & \hnode{propositions} & \hnode{ne}  & \hnode{seront} & \hnode{pas} & \hnode{mises} & \hnode{en} & \hnode{application} & \hnode{maintenant}\\
};
\path (French.east);
\pgfgetlastxy{\Frrx}{\Frry}%
\path (French.west);
\pgfgetlastxy{\Frlx}{\Frly}%
\pgfmathsetmacro{\Frwidth}{\Frrx - \Frlx}%
\path (French) ++(0,.8in) node[matrix,column sep=0em,matrix of nodes,expand,expand width={\Frwidth pt}] (English) {
% First sentence
\hnode{The} & \hnode{proposal} & \hnode{will} & \hnode{not} & \hnode{now} & \hnode{be} &  \hnode{implemented}\\
};
% Now connect the nodes.
\begin{scope}[every path/.style={line width=4pt,white,double=black},every to/.style={arm}, arm angleA=-90, arm angleB=90, arm length=5mm]
\draw (The) to (Les);
\draw (proposal) to (propositions);
\draw (will) to (seront);
\draw (not) -- ++(0,-.4in) coordinate (p1) {}
 (p1) to[arm lengthA=0pt] (ne)
 (p1) to[arm lengthA=0pt] (pas);
\draw (now) to (maintenant);
\draw (be) -- ++(0,-.4in);
\draw (implemented) -- ++(0,-.4in) coordinate (p2)
 (p2) to[arm lengthA=0pt] (mises)
 (p2) to[arm lengthA=0pt] (en)
 (p2) to[arm lengthA=0pt] (application);
\end{scope}
\end{tikzpicture}
\end{document} 

确实如此可怕的黑客行为第三次。我们不仅要摆弄矩阵命令中的节点位置,还必须修改主要例程之一:\halign实际将节点放置在正确位置的。这是因为我们想使用在列之间添加一些可拉伸胶水\tabskip。只有我们希望它位于边缘。我不是这方面的\halign爱好者,但我能让它发挥作用的唯一方法是在矩阵中每个实际列之间引入另一列(除了已经存在的额外列!),这可以处理对齐。否则,尝试\tabskip\halign序言中设置意味着它要么位于行的末尾(不需要),要么不在列之间(需要)。

所以这绝对是不是推荐,但它却困扰着我!

答案4

以下是使用PStricks

\documentclass{article}
\usepackage[english]{babel}
\usepackage{pstricks}% http://www.tug.org/PSTricks/main.cgi/
\usepackage{pst-node}
\usepackage{calc}% For width calculations
\begin{document}

\newcommand*{\Tword}[1]{\rnode{#1}{\raisebox{1ex}{\smash{#1}}}}% \Tword{<top>}
\newcommand*{\Bword}[1]{\rnode{#1}{\raisebox{-2ex}{\smash{#1}}}}% \Bword{<bottom>}

\newcommand{\TtoB}[3][]{% \TtoB{<top>}{<bottom}
  \ncdiag[arm=1em,angleA=-90,angleB=90,linestyle=solid,linecolor=black,linewidth=0.5pt,#1]{#2}{#3}}%

\begin{pspicture}(10,10)
  \rput[l](0,1){% English phrase
    \makebox[\widthof{Les propositions ne seront pas mises en application maintenant}][l]{\Tword{The} \Tword{proposal} \Tword{will} \Tword{not} \Tword{now} \Tword{be} \Tword{implemented}}%
  }%

    \rput[l](0,-1){% French phrase
    \Bword{Les} \Bword{propositions} \Bword{ne} \Bword{seront} \Bword{pas} \Bword{mises} \Bword{en} \Bword{application} \Bword{maintenant}%
  }%

  % Node connections between TOP (English) and BOTTOM (French) phrases
  \TtoB{The}{Les}%
  \TtoB{proposal}{propositions}%
  \TtoB{will}{seront}%
  \TtoB[linewidth=5pt,linecolor=white]{not}{ne} \TtoB{not}{ne} \TtoB{not}{pas}%
  \TtoB{now}{maintenant}%
  \TtoB[linewidth=5pt,linecolor=white]{implemented}{mises}%
  \TtoB[linewidth=5pt,linecolor=white]{implemented}{en}%
  \TtoB[linewidth=5pt,linecolor=white]{implemented}{application}%
  \TtoB{implemented}{mises} \TtoB{implemented}{en} \TtoB{implemented}{application}% Redraw
\end{pspicture}

\end{document}

5pt通过用白色( )然后用黑色( )绘制每个连接一次来解决单词/节点连接之间的重叠0.5pt

语言学示例 - 左对齐

修改<position>标识符\makebox命令l(左)到c(中)得到:

语言学示例 - 居中对齐

...并将其修改为s(传播)得到:

语言学示例 - 扩展对齐

尽管排版正确,并且分布正确,但 LaTeX 在最后一个定义上有点卡住\makebox[..][s]{...},并且抱怨Underfull \hbox (badness 10000)

添加从 到 '无处' 的连接be可以通过以下方式获得

\pcline([nodesep=0pt,angle=-90]be)([nodesep=1em,angle=-90]be)% No translation

使其与长度为\ncdiag( 英寸\TtoB) 1em、宽度0.5pt为 ( angle=-90) 的臂向下相匹配。

相关内容