AWK:在源术语后插入目标单词的快速方法

AWK:在源术语后插入目标单词的快速方法

我对 awk 不熟悉。为了在 198058 随机行中的源术语之后插入单个目标术语,我在这里有这个代码

awk -i inplace '(NR==FNR){a[$1];next}
    (FNR in a) && gsub(/\<Source Term\>/,"& Target Term")
     1
    ' <(shuf -n 198058 -i 1-$(wc -l < file)) file

包含file这样的句子行

David has to eat his vegetables .
This weather is very cold .
Can you please stop this music ? This is terrible music .
The teddy bear is very plushy .
I must be going !

例如,如果我想在“weather”之后插入单词“Wetter”,则某一行将如下所示

This weather Wetter is very cold .

如何重写代码,这样我只需包含两个不同的文件,其中包含源术语和目标术语的列表?

假设源术语文件名为sourceterms,目标术语文件名为targetterms

如果sourceterms包含这些术语的列表

vegetables
weather
terrible
plushy
going

targetterms包含这些条款

Gemüse
Wetter
schreckliche
flauschig
gehen

我希望我的代码检查每一行file是否包含源术语,并在其后插入目标术语,这样我的代码file将如下所示:

David has to eat his vegetables Gemüse .
This weather Wetter is very cold .
Can you please stop this music ? This is terrible schreckliche music .
The teddy bear is very plushy flauschig.
I must be going gehen!

上面这段代码可以重写吗?

答案1

使用 GNU awk (OP 正在使用)作为 ARGIND 和字边界:

$ cat tst.awk
ARGIND == 1 { olds[FNR] = "\\<" $1 "\\>"; next }
ARGIND == 2 { map[olds[FNR]] = "& " $1; next }
{
    for ( old in map ) {
        new = map[old]
        gsub(old,new)
    }
    print
}

$ awk -f tst.awk sourceterms targetterms file
David has to eat his vegetables Gemüse .
This weather Wetter is very cold .
Can you please stop this music ? This is terrible schreckliche music .
The teddy bear is very plushy flauschig .
I must be going gehen !

上面假设您的源不包含任何正则表达式元字符,并且您的替换文本不包含&反向引用元字符。它还假设如果相同的单词同时出现在源和目标中,则您不关心替换发生的顺序。

相关内容