删除重复文件但忽略某些行（例如嵌入时间戳）

2024-5-19 • tag-icon

我有一堆 gzip 压缩的文本文件。我正在尝试删除重复的文件（使用fdupes），但问题是这些文件包含一行，其中包含时间戳，并且文件在其他方面是相同的。

我希望能够找到重复项，忽略此行，但是不是从我保留的文件中删除该行。

就我而言，该行的形式如下：

-- Dump completed on 2014-07-12 10:00:01

现在我正在使用以下脚本，该脚本可以工作，但会从保留的文件中删除时间戳行：

#!/bin/sh

# Remove timestamp line from all gzipped text files by temporarily unzipping
# them, removing the line then rezipping. Preserve file system timestamp.
for a in *.sql.gz ; do
    gunzip -c $a | sed -e 's/^-- Dump completed.*//g' | gzip -c -9 > temp.gz
    touch -r $a temp.gz
    mv temp.gz $a
done

# Duplicates can now be removed.
fdupes . -dN

我想找到一个不涉及修改原始文件、保留时间戳行的解决方案。

我有什么办法可以做到这一点吗？

相关内容