如何在混乱的句子中找到单词的位置？

Question 1

在我看来，TeX 最有趣的地方在于它的排版，最糟糕的地方在于它的编程功能，因此最好在 TeX 之外进行此类编程（尽可能远离！），并且仅将 TeX 用于排版。一切都可能可能的使用 TeX，但它不一定是最简单/最易于维护的解决方案。

不过，如果使用 TeX，这种编程用 LuaTeX 更容易完成（至少对我来说是这样，我想大多数人也是如此）。使用编译以下文件lualatex（我让你的“标签”是可选的：你可以像一样标记每个单词the(1) quick(2) ...，或者只标记重复的单词）：

\documentclass[12pt]{memoir}
\usepackage{amsmath} % For \text

\newcommand{\printword}[2]{$\text{#1} ^ {#2}$\quad} % Or whatever formatting you like.
\newcommand{\linesep}{\newline}

\directlua{dofile('jumble.lua')}
\newcommand{\printjumble}[2]{
  \directlua{get_sentence1_lines()}{#1}
  \directlua{get_sentence2_words()}{#2}
  %
  \noindent
  Actual sentence:
  \newline
  \directlua{print_sentence1_lines()}

  \noindent
  Jumbled sentence:
  \textbf{\directlua{print_sentence2()}}
}

\begin{document}
\printjumble{
  the(1) quick brown fox
  +
  jumps over the(7) lazy dog
}{
  the(7) lazy dog jumps over the(1) quick brown fox
}
\end{document}

其中jumble.lua（可以内联到同一个.tex文件中，但我更喜欢保持分开）如下：

-- Expected from TeX: before calling print_sentence1_lines(),
--     call get_sentence1_lines() and get_sentence2_words()
--     define \printword and \linesep.
-- Globals: sentence2_words, position_for_word, sentence1_lines

function get_sentence1_lines()
   sentence1_lines = token.scan_string()
end

function get_sentence2_words()
   local sentence2 = token.scan_string()
   sentence2_words = {}
   position_for_word = {}
   local i = 0
   for word in string.gmatch(sentence2, "%S+") do
      i = i + 1
      assert(position_for_word[word] == nil, string.format('Duplicate word: %s', word))
      sentence2_words[i] = without_tags(word)
      position_for_word[word] = i
   end
end

function print_sentence2()
   for i, word in ipairs(sentence2_words) do
      tex.print(word)
   end
end

function print_sentence1_lines()
   for line in string.gmatch(sentence1_lines, "[^+]+") do
      for word in string.gmatch(line, "%S+") do
         position = position_for_word[word]
         assert(position_for_word[word] ~= nil, string.format('New word: %s', word))
         tex.print(string.format([[\printword{%s}{%s}]], without_tags(word), position))
      end
      tex.print([[\linesep]])
   end
end

function without_tags(word)
   local new_word = string.gsub(word, "%(.*%)", "")
   return new_word
end

这产生了

正如问题中所说。

请注意，您可以通过移动内容使其更短一些（例如，参见此答案的第一次修订），但我发现最干净的做法是（尽可能）保留文件中的排版说明.tex和文件中的编程.lua。

Answer

在我看来，TeX 最有趣的地方在于它的排版，最糟糕的地方在于它的编程功能，因此最好在 TeX 之外进行此类编程（尽可能远离！），并且仅将 TeX 用于排版。一切都可能可能的使用 TeX，但它不一定是最简单/最易于维护的解决方案。