突出显示单词列表中的每个出现?

突出显示单词列表中的每个出现?

为了修改草稿并确定相关部分,我想根据主题识别类似的单词(通过文本颜色、突出显示、下划线或其他方式)。

例如,我希望所有使用术语“foo”或“bar”的地方都以红色突出显示,所有使用术语“biz”和“baz”的地方都以绿色突出显示。

可能有四五组单词或词根我想指定。这只是复习,所以可能比较粗糙。

例如,替换此内容:

在此处输入图片描述

有了这个:

在此处输入图片描述

(在示例中,很难看清绿色文本;也许粗体+颜色或下划线会更有用)

更新A相关问题使用 XeLaTex 提供答案。我的文档无法使用 XeLaTex 编译,如果可用的话,我更喜欢与 pdflatex 兼容的解决方案(因为我使用的是 pdflatex),尽管我的文档也可以使用 luatex 编译。

其他相关问题:

答案1

使用 LuaTeX 回调的解决方案。还使用了 的luacolor.lua库。luacolor

第一个包裹luahighlight.sty

\ProvidesPackage{luahighlight}
%\RequirePackage{luacolor}
\@ifpackageloaded{xcolor}{}{\RequirePackage{xcolor}}
\RequirePackage{luatexbase}
\RequirePackage{luacode}
\newluatexattribute\luahighlight
\begin{luacode*}
highlight = require "highlight"
luatexbase.add_to_callback("pre_linebreak_filter", highlight.callback, "higlight")
\end{luacode*}

\newcommand\highlight[2][red]{
  \bgroup
  \color{#1}
  \luaexec{highlight.add_word("\luatexluaescapestring{\current@color}","\luatexluaescapestring{#2}")}
  \egroup
}

% save default document color
\luaexec{highlight.default_color("\luatexluaescapestring{\current@color}")}

% Use new attribute register in \set@color
\protected\def\set@color{%
  \setattribute\luahighlight{%
    \directlua{%
      oberdiek.luacolor.get("\luaescapestring{\current@color}")%
    }%
  }%
 \aftergroup\reset@color
}

% stolen from luacolor.sty
\def\reset@color{}
\def\luacolorProcessBox#1{%
  \directlua{%
    oberdiek.luacolor.process(\number#1)%
  }%
}
\directlua{%
  if luatexbase.callbacktypes.pre_shipout_filter then
    token.get_next()
  end
}\@secondoftwo\@gobble{
  \RequirePackage{atbegshi}[2011/01/30]
  \AtBeginShipout{%
    \luacolorProcessBox\AtBeginShipoutBox
  }
}
\endinput

\highlight提供命令,有一个必需参数和一个可选参数。必需参数是突出显示的单词,可选参数是颜色。在pre_linebreak_filter回调中,收集单词,并在匹配时插入颜色信息。

Lua 模块,highlight.lua

local M = {}

require "luacolor"

local words = {}
local chars = {}

-- get attribute allocation number and register it in luacolor
local attribute = luatexbase.attributes.luahighlight
-- local attribute = oberdiek.luacolor.getattribute
oberdiek.luacolor.setattribute(attribute)


-- make local version of luacolor.get

local get_color = oberdiek.luacolor.getvalue

-- we must save default color
local default_color 

function M.default_color(color)
  default_color = get_color(color)
end

local utflower = unicode.utf8.lower
function M.add_word(color,w)
  local w = utflower(w)
  words[w] = color
end

local utfchar = unicode.utf8.char

-- we don't want to include punctation
local stop = {}
for _, x in ipairs {".",",","!","“","”","?"} do stop[x] = true end

local glyph_id = node.id("glyph")
local glue_id  = node.id("glue")

function M.callback(head)
  local curr_text = {}
  local curr_nodes = {}
  for n in node.traverse(head) do
    if n.id == glyph_id then
      local char = utfchar(n.char)
      -- exclude punctation
      if not stop[char] then 
        local lchar = chars[char] or utflower(char)
        chars[char] = lchar
        curr_text[#curr_text+1] = lchar 
        curr_nodes[#curr_nodes+1] = n
      end
      -- set default color
      local current_color = node.has_attribute(n,attribute) or default_color
      node.set_attribute(n, attribute,current_color)
    elseif n.id == glue_id  then
      local word = table.concat(curr_text)
      curr_text = {}
      local color = words[word]
      if color then
        print(word)
        local colornumber = get_color(color)
        for _, x in ipairs(curr_nodes) do
          node.set_attribute(x,attribute,colornumber)
        end
      end
      curr_nodes = {}
    end
  end
  return head
end


return M

我们使用pre_linebreak_filter回调遍历节点列表,将glyph节点(id 37)收集到表中,当我们找到粘合节点(id 10,主要是空格)时,我们从收集的字形中构造一个单词。我们有一些禁止使用的字符(例如标点符号),我们会将其删除。所有字符都小写,因此我们甚至可以检测到句子开头的单词等。

当匹配到单词时,我们将attribute单词字形字段设置为luacolor库中保存的相关颜色的值。属性是 LuaTeX 中的新概念,它们允许将信息存储在节点中,这些信息可以在以后进行处理,就像我们的情况一样,因为在发货时,库会处理所有页面,luacolor并根据节点的luahighlight属性对它们进行着色。

\documentclass{article}

\usepackage[]{xcolor}
\usepackage{luahighlight}
\usepackage{lipsum}

\highlight[red]{Lorem}
\highlight[green]{dolor}
\highlight[orange]{world}
\highlight[blue]{Curabitur}
\highlight[brown]{elit}
\begin{document}

\def\world{earth}
\section{Hello world}

Hello world, world? world! \textcolor{purple}{but normal colors works} too\footnote{And also footnotes, for instance. World WORLD wOrld}. Hello \world.

\lipsum[1-12]
\end{document}

在此处输入图片描述 在此处输入图片描述

答案2

这是另一个l3regex

\documentclass{scrartcl}
\usepackage{xcolor,xparse,l3regex}
\ExplSyntaxOn
\NewDocumentCommand \texthighlight { +m } { \david_texthighlight:n { #1 } }
\cs_new_protected:Npn \david_texthighlight:n #1
 {
  \group_begin:
  \tl_set:Nn \l_tmpa_tl { #1 }
  \seq_map_inline:Nn \g_david_highlight_colors_seq
   {
    \clist_map_inline:cn { g_david_highlight_##1_clist }
     {
      \regex_replace_all:nnN { (\W)####1(\W) }
       { \1\c{textcolor}\cB\{##1\cE\}\cB\{####1\cE\}\2 } \l_tmpa_tl
     }
   }
  \tl_use:N \l_tmpa_tl
  \group_end:
 }
\seq_new:N \g_david_highlight_colors_seq
\NewDocumentCommand \addhighlighting { O{red} m }
 {
  \seq_if_in:NnF \g_david_highlight_colors_seq { #1 }
   { \seq_gput_right:Nn \g_david_highlight_colors_seq { #1 } }
  \clist_if_exist:cF { g_david_highlight_#1_clist }
   { \clist_new:c { g_david_highlight_#1_clist } }
  \clist_gput_right:cn { g_david_highlight_#1_clist } { #2 }
 }
\ExplSyntaxOff

\addhighlighting{amet,Mauris,ut,et,leo}
\addhighlighting[blue]{Phasellus,vestibulum}

\begin{document}
\texthighlight{Lorem ipsum dolor foo sit amet, bar consectetuer adipiscing
elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, vulputate a, magna. Donec vehicula augue eu
neque. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus
rhoncus sem. Nulla et lectus foo vestibulum urna fringilla ultrices.
Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien
est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem
vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla,
malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper
nulla. Donec varius orci eget risus. Duis nibh mi, congue eu,
accumsan eleifend, bar sagittis quis, diam. Duis eget orci sit amet orci
dignissim rutrum.

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur
dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, foo vulputate a, magna. Donec vehicula augue eu
neque. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus
rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices.
Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien
est, iaculis in, pretium quis, viverra ac, bar nunc. Praesent eget sem
vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla,
malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper
nulla. Donec varius orci eget risus. Duis nibh mi, congue eu,
accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci
dignissim rutrum.}
\end{document}

在此处输入图片描述

答案3

强烈基于我的回答如何在出现单词的行首插入一个符号?。但是,我必须扩展逻辑来处理多个颜色分配。语法是多次调用 \WordsToNote{space separated list}{color}然后\NoteWords{multiple paragraph input}

输入中的宏仅限于样式(例如\textit)和大小(例如\small)更改。否则,只接受纯文本。

正如参考答案中详细说明的那样,我调整了我的titlecaps包,该包通常会将其参数中每个单词的首字母大写,并添加用户指定的例外列表。在这里,我没有将单词大写,而是保留它们。但是,我捕获了用户指定的单词例外并使用它们设置不同的颜色。

在该方法的扩展中,我必须修改两个titlecaps宏:\titlecap\seek@lcwords

该方法不能处理单词子集,但可以忽略标点符号。

编辑以修复标记词带有标点符号时出现的错误,以及段落第一个词的问题。

\documentclass{article}
\usepackage{titlecaps}
\makeatletter
\renewcommand\titlecap[2][P]{%
  \digest@sizes%
  \if T\converttilde\def~{ }\fi%
  \redefine@tertius%
  \get@argsC{#2}%
  \seek@lcwords{#1}%
  \if P#1%
    \redefine@primus%
    \get@argsC{#2}%
    \protected@edef\primus@argi{\argi}%
  \else%
  \fi%
  \setcounter{word@count}{0}%
  \redefine@secundus%
  \def\@thestring{}%
  \get@argsC{#2}%
  \if P#1\protected@edef\argi{\primus@argi}\fi%
  \whiledo{\value{word@count} < \narg}{%
    \addtocounter{word@count}{1}%
    \if F\csname found@word\roman{word@count}\endcsname%
      \notitle@word{\csname arg\roman{word@count}\endcsname}%
      \expandafter\protected@edef\csname%
           arg\roman{word@count}\endcsname{\@thestring}%
    \else
      \notitle@word{\csname arg\roman{word@count}\endcsname}%
      \expandafter\protected@edef\csname%
         arg\roman{word@count}\endcsname{\color{%
           \csname color\romannumeral\value{word@count}\endcsname}%
      \@thestring\color{black}{}}%
    \fi%
  }%
  \def\@thestring{}%
  \setcounter{word@count}{0}%
  \whiledo{\value{word@count} < \narg}{%
    \addtocounter{word@count}{1}%
    \ifthenelse{\value{word@count} = 1}%
   {}{\add@space}%
    \protected@edef\@thestring{\@thestring%
      \csname arg\roman{word@count}\endcsname}%
  }%
  \let~\SaveHardspace%
  \@thestring%
  \restore@sizes%
\un@define}

% SEARCH TERTIUS CONVERTED ARGUMENT FOR LOWERCASE WORDS, SET FLAG
% FOR EACH WORD (T = FOUND IN LIST, F= NOT FOUND IN LIST)
\renewcommand\seek@lcwords[1]{%
\kill@punct%
  \setcounter{word@count}{0}%
  \whiledo{\value{word@count} < \narg}{%
    \addtocounter{word@count}{1}%
    \protected@edef\current@word{%
      \csname arg\romannumeral\value{word@count}\endcsname}%
    \def\found@word{F}%
    \setcounter{lcword@index}{0}%
    \expandafter\def\csname%
            found@word\romannumeral\value{word@count}\endcsname{F}%
    \whiledo{\value{lcword@index} < \value{lc@words}}{%
      \addtocounter{lcword@index}{1}%
      \protected@edef\current@lcword{%
        \csname lcword\romannumeral\value{lcword@index}\endcsname}%
%% THE FOLLOWING THREE LINES ARE FROM DAVID CARLISLE
  \protected@edef\tmp{\noexpand\scantokens{\def\noexpand\tmp%
   {\noexpand\ifthenelse{\noexpand\equal{\current@word}{\current@lcword}}}}}%
  \tmp\ifhmode\unskip\fi\tmp
%%
      {\expandafter\def\csname%
            found@word\romannumeral\value{word@count}\endcsname{T}%
      \expandafter\protected@edef\csname color\romannumeral\value{word@count}\endcsname{%
       \csname CoLoR\csname lcword\romannumeral\value{lcword@index}\endcsname\endcsname}%
      \setcounter{lcword@index}{\value{lc@words}}%
      }%
      {}%
    }%
  }%
\if P#1\def\found@wordi{F}\fi%
\restore@punct%
}
\makeatother
\usepackage{xcolor}
\newcommand\WordsToNote[2]{\Addlcwords{#1}\edef\assignedcolor{#2}%
  \assigncolor#1 \relax\relax}
\def\assigncolor#1 #2\relax{%
  \expandafter\edef\csname CoLoR#1\endcsname{\assignedcolor}%
  \ifx\relax#2\else\assigncolor#2\relax\fi%
}
\newcommand\NoteWords[1]{\NoteWordsHelp#1\par\relax}
\long\def\NoteWordsHelp#1\par#2\relax{%
  \titlecap[p]{#1}%
  \ifx\relax#2\else\par\NoteWordsHelp#2\relax\fi%
}
\begin{document}
\WordsToNote{foo bar at}{red}
\WordsToNote{Nulla dolor nulla}{cyan}
\WordsToNote{amet est et}{orange}
\WordsToNote{Lorem Ut ut felis}{green}
\NoteWords{
\textbf{Lorem ipsum dolor foo sit amet, bar consectetuer adipiscing elit}. Ut
purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur
dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, vulputate a, magna. Donec vehicula augue eu
neque. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus
rhoncus sem. \textit{Nulla et lectus foo} vestibulum urna fringilla ultrices.
Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien
est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem
vel leo ultrices bibendum. \scshape Aenean faucibus. Morbi dolor nulla,
malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper
nulla. Donec varius orci eget risus. \upshape Duis nibh mi, congue eu,
accumsan eleifend, bar sagittis quis, diam. Duis eget orci sit amet orci
dignissim rutrum.

\textsf{Lorem ipsum dolor sit amet}, consectetuer adipiscing elit. Ut
purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur
dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, foo vulputate a, magna. Donec vehicula augue eu
neque. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus
rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices.
Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien
est, iaculis in, pretium quis, viverra ac, bar nunc. Praesent eget sem
vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla,
malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper
nulla. Donec varius orci eget risus. Duis nibh mi, congue eu,
accumsan eleifend, sagittis quis, diam. \Large Duis eget orci sit amet orci
dignissim rutrum.\normalsize
}
\end{document}

在此处输入图片描述

答案4

这是一个简单的脚本,可以通过编辑脚本来标记您指定的单词——这是处理大量单词和大量不同颜色的最简单的方法。它需要 perl,这是 Unix(Linux/OS X)上的标准,在 Windows 上只需下载一次即可。我假设您有很多关键字要标记,所以我使用了 perl,它使管理列表变得容易。将其保存为文件highlight.pl,输入您的关键字,然后像这样运行它(命令行):

perl highlight.pl document.tex > edited-document.tex

该脚本使用 构建空格分隔的单词列表qw(...)。如果您需要突出显示多个单词跨度,请让我添加适当语法的示例。您可以将其设置为任意数量的颜色。还请注意,单词将组合成正则表达式,因此您可以如果需要的话,使用通配符。

#!/usr/bin/perl 

# Enter all the keys to highlight here, separated by whitespace. The lists
# can extend over any number of lines. 
$keywords = join("|", qw(foo bar));
$trouble = join("|", qw(
biz 
baz
));

while (<>) {
      if (m/\\begin\{document\}/..m/\\end\{document\}/) {
         s/\b($keywords)\b/\\keyword{$1}/g;
         s/\b($trouble)\b/\\needswork{$1}/g;
      }
      print;
}

脚本将跳过前言部分,只在文档正文中进行替换。我使用两种突出显示和来演示\keyword{..}\needswork{...}它们的作用由您决定;使用您想要的任何宏名称,并在文档的前言中定义它们。

相关内容