如何删除多个文件中的重复单词并仅保留最初找到的文件中的唯一单词

Question

我对文件执行了同样的操作（删除除 size + crc32 相同的所有内容），但我使用了一个花哨的脚本来过滤掉这些东西。

您可以首先使用类似的方法word | sort | uniq为每个文件生成一个排序的单词列表。

然后我将使用关联数组（如 REXX），其中

 /*  REXX  */
 used. = 0
 do n = 1 to 10; call dofile; end
 exit

 dofile: 
 infile = n'.txt'; outfile = n'.out'
 call stream infile, 'c', 'open read'
 call stream outfile, 'c', 'open write replace'
 do while lines(infile)
   word = linein(infile)
   /* remove the comment markers to make it case insensitive */
   /* word = translate(word) */
   if used.word = 0  
      then do; call lineout outfile, word; used.word = 1; end
  end
  call stream outfile, 'c', 'close'
  call stream infile, 'c', 'close'
  return

这个特定的脚本维护着所有文件中使用的所有单词的列表。它读入一个文件，看看这个单词是已知的还是需要学习的。如果要学习，那么它就会被记住，并且副本会被写入学习它的课程的 .out 文件中。所以在你的例子中，“xyz”是在第 3 课中学习的，并且位于 3.out 中，而 ABC 是在第 1 课中学习的，所以位于 1.out 中。

有点像学习一门语言。

Answer 1