sed:在开头相同但不相同的后续行之前插入一些内容

sed:在开头相同但不相同的后续行之前插入一些内容

我有一个 LaTeX 文件,每行一个术语表条目:

...
\newglossaryentry{ajahn}{name=Ajahn,description={\textit{(Thai)} From the Pali \textit{achariya}, a Buddhist monk's preceptor: `teacher'; often used as a title of the senior monk or monks at monastery. In the West, the forest tradition uses it for all monks and nuns of more than ten years' seniority}}
\newglossaryentry{ajivaka}{name={\=Aj\={\i}vaka},description={Sect of contemplatives contemporary with the Buddha who held the view that beings have no volitional control over their actions and that the universe runs according to fate and destiny}}
...

这里我们只关心\newglossaryentry{label}每行的部分。

文件的行已使用 排序sort,因此重复的标签如下所示:

\newglossaryentry{anapanasati}{name=\=an\=ap\=anasati,description={`Awareness of inhalation and exhalation'; using the breath, as a mediation object},sort=anapanasati}
\newglossaryentry{anapanasati}{name={\=an\=ap\=anasati},description={Mindfulness of breathing. A meditation practice in which one maintains one's attention and mindfulness on the sensations of breathing. \textbf{[MORE]}}}

如何在sed该文件中在重复标签之前插入一行?

#!/bin/sh

cat glossary.tex | sed '
/\\newglossaryentry[{][^}]*[}]/{
    N;
    s/^\(\\newglossaryentry[{][^}]*[}]\)\(.*\)\n\1/% duplicate\n\1\2\n\1/;
}' > glossary.sed.tex

我按照上面的命令进行了操作,但它有一个缺陷:它成对读取模式空间中的行,因此仅当重复项恰好是它读入的对时才有效。

这些将不匹配,例如:

\newglossaryentry{abhinna}{name={abhi\~n\~n\=a},description={Intuitive powers that come from the practice of concentration: the ability to display psychic powers, clairvoyance, clairaudience, the ability to know the thoughts of others, recollection of past lifetimes, and the knowledge that does away with mental effluents (see \textit{asava}).}}
\newglossaryentry{acariya}{name={\=acariya},description={Teacher; mentor. See \textit{kalyanamitta.}}}
\newglossaryentry{acariya}{name=\=acariya,description={Teacher},see=Ajahn}
\newglossaryentry{adhitthana}{name={adhi\d{t}\d{t}h\=ana},description={Determination; resolution. One of the ten perfections \textit{(paramis).}}}

因为首先它读入以下行阿比纳阿查里亚,然后它读取阿查里亚阿迪他那

我认为这需要一些额外的sed魔法来保持空间和条件打印线条,但我无法理解它。

答案1

这对于 sed 来说相当复杂,对于 awk 或 perl 来说更像是一项工作。这是一个查找连续重复项的脚本(但允许其间存在不匹配的行):

perl -l -pe '
    if (/^ *\\newglossaryentry[* ]*{([^{}]*)}/) {
        print "% duplicate" if $1 eq $prev;
        $prev = $1;
    }'

即使在未排序的输入中,也很容易检测重复项。

perl -l -pe '
    if (/^ *\\newglossaryentry[* ]*{([^{}]*)}/) {
        print "% duplicate" if $seen{$1};
        ++$seen{$1};
    }'

您还可以轻松限制为连续行:

perl -l -pe '
    if (/^ *\\newglossaryentry[* ]*{([^{}]*)}/) {
        print "% duplicate" if $1 eq $prev;
        $prev = $1;
    } else {undef $prev}'

相关内容