比较旧文件和新文件，但忽略仅存在于新文件中的行？

Question 1

使用join合并两个文件中的匹配行。假设文件名位于校验和之后（如md5sum输出所示）并且不包含空格，这将打印两个列表中存在的所有文件名，以及旧校验和和新校验和：

join -1 2 -2 2 <(sort -k 2 oldlist) <(sort -k 2 newlist)

要同时查看新文件，请将-a选项传递给join.一些输出后处理将删除校验和未更改的文件名。

join -a 2 -1 2 -2 2 <(sort -k 2 oldlist) <(sort -k 2 newlist) |
awk '$2 != $3'

Answer

使用join合并两个文件中的匹配行。假设文件名位于校验和之后（如md5sum输出所示）并且不包含空格，这将打印两个列表中存在的所有文件名，以及旧校验和和新校验和：

join -1 2 -2 2 <(sort -k 2 oldlist) <(sort -k 2 newlist)

要同时查看新文件，请将-a选项传递给join.一些输出后处理将删除校验和未更改的文件名。

join -a 2 -1 2 -2 2 <(sort -k 2 oldlist) <(sort -k 2 newlist) |
awk '$2 != $3'

Question 2

你可以单独做到这一点awk：

$ awk 'FNR==NR   { o[$2]=$1; next }       !o[$2] { print $0, "NEW"; next } 
       $1!=o[$2] { print $0, "CHANGED" }' newlist oldlist

（请注意，文件的假定格式是md5sum的输出格式：“md5 文件名”。）

更新：一步步解释该awk单行代码的工作原理。

awk 'FNR==NR { # if current record number==overall record number (still processing the first file)
  o[$2]=$1     # store the record in array o: the key is the file name, the value is the md5
  next         # go to next record (do not execute the rest of the code)
}
# reaching this point means we are processing the second input file
!o[$2] {       # if array o not contains item with the current record`s file name
  print $0, "NEW" # print the current record and specify that it`s new
  next         # go to next record (do not execute the rest of the code)
}
# reaching this point means array o contains item with the current file name
$1!=o[$2] {    # if the current md5 is not equal with the md5 save for the current file name
  print $0, "CHANGED" # print the current record and specify it`s changed
}' newlist oldlist

Answer

你可以单独做到这一点awk：

$ awk 'FNR==NR   { o[$2]=$1; next }       !o[$2] { print $0, "NEW"; next } 
       $1!=o[$2] { print $0, "CHANGED" }' newlist oldlist

（请注意，文件的假定格式是md5sum的输出格式：“md5 文件名”。）

更新：一步步解释该awk单行代码的工作原理。

awk 'FNR==NR { # if current record number==overall record number (still processing the first file)
  o[$2]=$1     # store the record in array o: the key is the file name, the value is the md5
  next         # go to next record (do not execute the rest of the code)
}
# reaching this point means we are processing the second input file
!o[$2] {       # if array o not contains item with the current record`s file name
  print $0, "NEW" # print the current record and specify that it`s new
  next         # go to next record (do not execute the rest of the code)
}
# reaching this point means array o contains item with the current file name
$1!=o[$2] {    # if the current md5 is not equal with the md5 save for the current file name
  print $0, "CHANGED" # print the current record and specify it`s changed
}' newlist oldlist

Question 3

如果我正确理解了你的问题，那么comm确实可以做你想做的事。我建议调查一下comm --help

具体地

  -1              suppress column 1 (lines unique to FILE1)
  -2              suppress column 2 (lines unique to FILE2)
  -3              suppress column 3 (lines that appear in both files)

所以comm newFile oldFile -1 -3会做你想做的事。

Answer

如果我正确理解了你的问题，那么comm确实可以做你想做的事。我建议调查一下comm --help

具体地

  -1              suppress column 1 (lines unique to FILE1)
  -2              suppress column 2 (lines unique to FILE2)
  -3              suppress column 3 (lines that appear in both files)

所以comm newFile oldFile -1 -3会做你想做的事。

Question 4

只是作为替代方案，我总是使用“ sdiff -s”来比较文件列表或 md5sum。

假设文件是正常的md5sum输出“ md5hash filename”。然后：

sdiff -s oldfile newfile | grep -v ">"
# sorting on the md5hash should help align and pick up renamed files.
sdiff -s <(sort oldfile) <(sort newfile)

打破这个：
sdiff -s：抑制公共行，因此精确匹配被忽略。显示|, <,>的差异。
<(sort oldfile)：该命令是否在 sdiff 之前排序。
grep -v ">"：忽略新文件条目。仅当文件名中没有时才有效>，无论如何这是不可能的。

sdiff可以更改的宽度以显示更长的线条-w 100。

Answer

只是作为替代方案，我总是使用“ sdiff -s”来比较文件列表或 md5sum。

假设文件是正常的md5sum输出“ md5hash filename”。然后：

sdiff -s oldfile newfile | grep -v ">"
# sorting on the md5hash should help align and pick up renamed files.
sdiff -s <(sort oldfile) <(sort newfile)

打破这个：
sdiff -s：抑制公共行，因此精确匹配被忽略。显示|, <,>的差异。
<(sort oldfile)：该命令是否在 sdiff 之前排序。
grep -v ">"：忽略新文件条目。仅当文件名中没有时才有效>，无论如何这是不可能的。

sdiff可以更改的宽度以显示更长的线条-w 100。

比较旧文件和新文件，但忽略仅存在于新文件中的行？

答案1

答案2

答案3

答案4

相关内容