如何从 txt 文件中删除另一个 txt 文件上存在的单词？

Question 1

有一个命令可以执行此操作：comm。如中所述man comm，它很简单：

   comm -3 file1 file2
          Print lines in file1 not in file2, and vice versa.

请注意，comm要求文件内容经过排序，因此您必须在调用它们之前comm对它们进行排序，就像这样：

sort unsorted-file.txt > sorted-file.txt

总结一下：

sort a.txt > as.txt

sort b.txt > bs.txt

comm -3 as.txt bs.txt > result.txt

执行上述命令后，文件中就会出现预期的行result.txt。

Answer

有一个命令可以执行此操作：comm。如中所述man comm，它很简单：

   comm -3 file1 file2
          Print lines in file1 not in file2, and vice versa.

请注意，comm要求文件内容经过排序，因此您必须在调用它们之前comm对它们进行排序，就像这样：

sort unsorted-file.txt > sorted-file.txt

总结一下：

sort a.txt > as.txt

sort b.txt > bs.txt

comm -3 as.txt bs.txt > result.txt

执行上述命令后，文件中就会出现预期的行result.txt。

Question 2

这是一个简短的python3脚本，基于Germar 的回答b.txt，它应该在保留未排序顺序的同时实现这一点。

#!/usr/bin/python3

with open('a.txt', 'r') as afile:
    a = set(line.rstrip('\n') for line in afile)

with open('b.txt', 'r') as bfile:
    for line in bfile:
        line = line.rstrip('\n')
        if line not in a:
            print(line)
            # Uncomment the following if you also want to remove duplicates:
            # a.add(line)

Answer

这是一个简短的python3脚本，基于Germar 的回答b.txt，它应该在保留未排序顺序的同时实现这一点。

#!/usr/bin/python3

with open('a.txt', 'r') as afile:
    a = set(line.rstrip('\n') for line in afile)

with open('b.txt', 'r') as bfile:
    for line in bfile:
        line = line.rstrip('\n')
        if line not in a:
            print(line)
            # Uncomment the following if you also want to remove duplicates:
            # a.add(line)

Question 3

#!/usr/bin/env python3

with open('a.txt', 'r') as f:
    a_txt = f.read()
a = a_txt.split('\n')
del(a_txt)

with open('b.txt', 'r') as f:
    while True:
        b = f.readline().strip('\n ')
        if not len(b):
            break
        if not b in a:
            print(b)

Answer

#!/usr/bin/env python3

with open('a.txt', 'r') as f:
    a_txt = f.read()
a = a_txt.split('\n')
del(a_txt)

with open('b.txt', 'r') as f:
    while True:
        b = f.readline().strip('\n ')
        if not len(b):
            break
        if not b in a:
            print(b)

Question 4

看一下 coreutilscomm命令 -man comm

NAME
       comm - compare two sorted files line by line

SYNOPSIS
       comm [OPTION]... FILE1 FILE2

DESCRIPTION
       Compare sorted files FILE1 and FILE2 line by line.

       With  no  options,  produce  three-column  output.  Column one contains
       lines unique to FILE1, column two contains lines unique to  FILE2,  and
       column three contains lines common to both files.

       -1     suppress column 1 (lines unique to FILE1)

       -2     suppress column 2 (lines unique to FILE2)

       -3     suppress column 3 (lines that appear in both files)

例如你可以这样做

$ comm -13 <(sort a.txt) <(sort b.txt)
diary.txt
NOVEMBER.txt

（独有的线条b.txt）

Answer

看一下 coreutilscomm命令 -man comm

NAME
       comm - compare two sorted files line by line

SYNOPSIS
       comm [OPTION]... FILE1 FILE2

DESCRIPTION
       Compare sorted files FILE1 and FILE2 line by line.

       With  no  options,  produce  three-column  output.  Column one contains
       lines unique to FILE1, column two contains lines unique to  FILE2,  and
       column three contains lines common to both files.

       -1     suppress column 1 (lines unique to FILE1)

       -2     suppress column 2 (lines unique to FILE2)

       -3     suppress column 3 (lines that appear in both files)

例如你可以这样做

$ comm -13 <(sort a.txt) <(sort b.txt)
diary.txt
NOVEMBER.txt

（独有的线条b.txt）

如何从 txt 文件中删除另一个 txt 文件上存在的单词？

答案1

答案2

答案3

答案4

相关内容