问题
我现在正在为下学期准备 NLP 课程材料。我想以以下格式包含文本
Spanish Farm Minister Loyola de Palacio had earlier accused
7 0 0 1 2 2 0 0 0
Fischler at an EU farm ministers ' meeting of causing unjustified
1 0 0 3 0 0 0 0 0 0 0
alarm through " dangerous generalisation . "
0 0 0 0 0 0 0
其中标记和数字之间的对齐由 Python 脚本创建。
但是直接复制粘贴这些文字会破坏对齐。我尝试过把 token 和数字放到一个表格中,让规则不可见。但结果看起来很丑。
我认为tikz
它非常适合这个应用程序。有人能帮我吗?
编辑
我正在寻找可以复制 Python 输出的 LaTeX 解决方案。
用于创建比对的 Python 脚本
from collections import defaultdict
def print_tuple(tuple_list, max_char_length=50):
# tuple_list: [(a1, b1, c1, d1,...), (a2, b2, c2, d2,...), ...]
# if any of the tuple has more than n_token tokens, ignore extra tokens
n_token = min(map(len, tuple_list))
token_list_dict = defaultdict(list)
length = 0
full_string = ""
string_format = ""
for tup in tuple_list:
# length
max_len = max(map(len, tup))
length += max_len
# print format
string_format += "{:<%d" % (max_len + 2) + "}"
for i in range(n_token): token_list_dict[i].append(tup[i])
if length >= max_char_length:
# append
for token_list in token_list_dict.values():
full_string += "%s\n" % string_format.format(*token_list)
full_string += "\n"
# reset
length = 0
string_format = ""
token_list_dict = defaultdict(list)
# when remaining tokens is shorter than max_char_length, append remaining tokens
for token_list in token_list_dict.values():
full_string += "%s\n" % string_format.format(*token_list)
print(full_string)
sample = [('Spanish', '7'), ('Farm', '0'), ('Minister', '0'), ('Loyola', '1'),
('de', '2'), ('Palacio', '2'), ('had', '0'), ('earlier', '0'),
('accused', '0'), ('Fischler', '1'), ('at', '0'), ('an', '0'),
('EU', '3'), ('farm', '0'), ('ministers', '0'), ("'", '0'),
('meeting', '0'), ('of', '0'), ('causing', '0'), ('unjustified', '0'),
('alarm', '0'), ('through', '0'), ('"', '0'), ('dangerous', '0'),
('generalisation', '0'), ('.', '0'), ('"', '0')]
print_tuple(sample)
答案1
以下是两种不同的方法
使用三个tabular
环境
或listings
包
\documentclass{article}
\usepackage{geometry}
\usepackage{listings}
\begin{document}
\begin{tabular}{@{}*{9}{l}}
Spanish &Farm &Minister &Loyola &de &Palacio &had &earlier &accused \\
7 &0 &0 &1 &2 &2 &0 &0 &0
\end{tabular}\smallskip
\begin{tabular}{@{}*{11}{l}}
Fischler &at &an &EU &farm &ministers &' &meeting &of &causing &unjustified \\
1 &0 &0 &3 &0 &0 &0 &0 &0 &0 &0
\end{tabular}\smallskip
\begin{tabular}{@{}*{7}{l}}
alarm &through &" &dangerous &generalisation &. &" \\
0 &0 &0 &0 &0 &0 &0
\end{tabular}
\begin{lstlisting}
Spanish Farm Minister Loyola de Palacio had earlier accused
7 0 0 1 2 2 0 0 0
Fischler at an EU farm ministers ' meeting of causing unjustified
1 0 0 3 0 0 0 0 0 0 0
alarm through " dangerous generalisation . "
0 0 0 0 0 0 0
\end{lstlisting}
\end{document}
答案2
使用的解决方案TikZ
。如果每对字/数字由单独的环境绘制tikzpicture
。
代码
\documentclass[11pt, a4paper]{article}
\usepackage{tikz}
\begin{document}
\noindent
\foreach \stg/\i in {Farm/7, Minister/0, Loyola/1, de/2,
Palacio/2, had/0, earlier/0, accused/0, Fischler/1, at/0, an/0,
EU/3, farm/0, ministers/0, '/0, meeting/0, of/0, causing/0,
unjustified/0, alarm/0, through/0, "/0, dangerous/0,
generalisation/0, "/0, ./0}{%
\begin{tikzpicture}[baseline=-6ex,
every node/.style={text depth=0, anchor=west}]
\path (0, 0) node {\stg};
\path (0, -3ex) node {\i};
\end{tikzpicture}
}
\end{document}