如何将增量计数附加到文本文件的每个预定义单词?

如何将增量计数附加到文本文件的每个预定义单词?

如何将增量计数附加到文本文件的每个预定义单词?

就像这个问题一样: 如何将增量计数附加到文本文件的每一行?

我想向文本文件添加增量计数。但我不想向每行添加增量计数,而是想向预定义的单词添加增量计数。

例如,如果我想统计文本中的“cinema”一词,我希望将所有出现的“cinema”更改为“cinemaN”,其中 N 是增量数,N 的最大值取决于有多少个文本中出现“电影”一词的次数。

这样,包含此文本的输入文本文件:

他开着车去电影院。随后他进了电影院买了票,后来才发现,距离他上次去电影院已经有两年多了。

生成包含以下内容的输出文件:

他开车去电影院1。随后,他走进电影院买票,事后发现,距离他上次去电影院已经有两年多了。

最好我还希望能够按向后顺序对所选单词进行编号。

即,这将生成具有以下内容的第二个输出文件:

他开车去电影院3。随后,他走进电影院买票,事后发现,距离他上次去电影院已经有两年多了。

答案1

我更喜欢perl这个:

$ cat ip.txt 
He drove his car to the cinema. He then went inside the cinema to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema.

$ # forward counting is easy
$ perl -pe 's/\bcinema\b/$&.++$i/ge' ip.txt 
He drove his car to the cinema1. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema3.
  • \bcinema\b要搜索的单词,使用单词边界,这样它就不会作为另一个单词的部分部分进行匹配。例如,\bpar\b不会匹配apartparkspar
  • geg标志用于全局替换。e允许在替换部分使用 Perl 代码
  • $&.++$i是匹配单词和预递增值的串联,其$i默认值为0


对于反向,我们需要先得到计数......

$ c=$(grep -ow 'cinema' ip.txt | wc -l) perl -pe 's/\bcinema\b/$&.$ENV{c}--/ge' ip.txt 
He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema1.
  • c成为可通过哈希访问的环境变量%ENV

或者,perl单独使用整个文件

perl -0777 -pe '$c=()=/\bcinema\b/g; s//$&.$c--/ge' ip.txt 

答案2

使用 GNU awk 进行多字符 RS、不区分大小写的匹配和字边界:

$ awk -v RS='^$' -v ORS= -v word='cinema' '
    BEGIN { IGNORECASE=1 }
    { cnt=gsub("\\<"word"\\>","&"); while (sub("\\<"word"\\>","&"cnt--)); print }
' file
He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema1.

答案3

考虑单词后面的标点符号。
正向编号:

word="cinema"
awk -v word="$word" '
    { 
      for (i = 1; i <= NF; i++) 
        if ($i ~ word "([,.;:)]|$)") { 
          gsub(word, word "" ++count,$i) 
        }
      print 
    }' input-file

向后编号:

word="cinema"
count="$(awk -v word="$word" '
    { count += gsub(word, "") }
    END { print count }' input-file)"
awk -v word="$word" -v count="$count" '
    { 
      for (i = 1; i <= NF; i++) 
        if ($i ~ word "([,.;:)]|$)") { 
          gsub(word, word "" count--, $i) 
        }
      print 
    }' input-file

答案4

为了以降序标记单词,我们反转正则表达式并反转数据,最后再次反转日期以实现转换:

perl -l -0777pe '$_ = reverse reverse =~ s/(?=\bamenic\b)/++$a/gre' input.data

结果

He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and
afterwards discovered that it was more then two years since he last visited the cinema1.

为了按升序标记单词,我们对单词进行后向搜索:

perl -lpe 's/\bcinema\b\K/++$a/eg' input.data

结果

He drove his car to the cinema1. He then went inside the cinema2 to purchase tickets, and
afterwards discovered that it was more then two years since he last visited the cinema3.

相关内容