如何找到文件中的重复行，使用文件名输出它们，并对整个目录执行此操作？

Question 1

这应该可以做到：

for i in *.*; do sort $i|uniq -d |sed -e "s/^/$i:/"; done

为方便阅读而扩展：

for i in *.*; do
  sort $i | uniq -d | sed -e "s/^/$i:/";
done

xlsx 文件实际上只是文件的 zip 目录，您可以解压缩 xlsx 并查看 .xlsx 下的内容file/xl/worksheets/sheet1.xml。不过这些文件是 xml 格式的，因此您需要在尝试处理它们之前解析它们。

Answer

这应该可以做到：

for i in *.*; do sort $i|uniq -d |sed -e "s/^/$i:/"; done

为方便阅读而扩展：

for i in *.*; do
  sort $i | uniq -d | sed -e "s/^/$i:/";
done

xlsx 文件实际上只是文件的 zip 目录，您可以解压缩 xlsx 并查看 .xlsx 下的内容file/xl/worksheets/sheet1.xml。不过这些文件是 xml 格式的，因此您需要在尝试处理它们之前解析它们。

Question 2

用于查找具有重复行的文件的选项。
请注意，空行也可以与模式匹配。

awk 'D[$0]++ {print FILENAME; nextfile}' *.*

要排除空行，您需要添加另一个过滤器，例如：

awk '/./ && D[$0]++ {print FILENAME; nextfile}' *.*

因此您可以排除带有空白字符的行

gawk '/\S/ && D[$0]++ {print FILENAME; nextfile}' *.*

这是您任务的答案（没有空白和空行）：

awk 'FNR == 1 {delete D;j=0} /[^[:blank:]]/ && (D[$0]++ == 1) {if(! j++ ) print "name: " FILENAME; print}' *.*

Answer

用于查找具有重复行的文件的选项。
请注意，空行也可以与模式匹配。

awk 'D[$0]++ {print FILENAME; nextfile}' *.*

要排除空行，您需要添加另一个过滤器，例如：

awk '/./ && D[$0]++ {print FILENAME; nextfile}' *.*

因此您可以排除带有空白字符的行

gawk '/\S/ && D[$0]++ {print FILENAME; nextfile}' *.*

这是您任务的答案（没有空白和空行）：

awk 'FNR == 1 {delete D;j=0} /[^[:blank:]]/ && (D[$0]++ == 1) {if(! j++ ) print "name: " FILENAME; print}' *.*

相关内容