仅当第一列相同时输出

Question 1

使用这个“魔法”:)

find . -type f | xargs -I "{}" md5sum "{}" | awk '{count[$1]=count[$1]" "$2}END{for(j in count) if( (split(count[j], A)) > 1) print count[j]}' | sed -e 's/\.\///g'

Answer

使用这个“魔法”:)

find . -type f | xargs -I "{}" md5sum "{}" | awk '{count[$1]=count[$1]" "$2}END{for(j in count) if( (split(count[j], A)) > 1) print count[j]}' | sed -e 's/\.\///g'

Question 2

第一个脚本是以下脚本的扩展谢尔盖·洛马科夫答案，但评论有点太长了。它迎合了文件名中的空格，并引用每个“名称”。该方法不需要种类步。

第二个脚本是另一种方式，使用 sort + awk，但没有第一种方法的数组处理。当然，如果这对您来说是一个问题，它会丢失输入序列（但对于这个问题来说没关系，因为它无论如何都使用排序步骤）。

两种方法也都使用sed\x00作为字段分隔符引入；启用空白处理。

方法1，awk中的数组。

find . -type f | 
  xargs -I {} md5sum {} |
    sed 's/ [ *]/\x00/' | # "  "==text, " *"==binary
      awk -F"\x00" '{
             if( md5s[$1] == "" ) {sep=""} else {sep=FS} 
             md5s[$1]=md5s[$1] sep $2 }
        END{ for(md5 in md5s ) {
               if( (split(md5s[md5], names, FS)) > 1 ) {
                 sep="\""  
                 for( ix in names ) {
                   printf "%s%s", sep, names[ix]
                   sep="\" \"" }
                 print "\"" } } }'

方法2、排序+awk。

find . -type f | 
  xargs -I {} md5sum {} |
    sort |sed 's/ [ *]/\x00/' | # "  "==text, " *"==binary
      awk -F"\x00" '{
             if (pkey!=$1) { ct=-1; pkey=$1; pnam=$2 }
             else{if (++ct) { printf(" \"%s\"",$2) }
                  else { printf("%s\"%s\" \"%s\"",nl,pnam,$2)
                         nl="\n" } } }
        END{ print "" }'

输出

"./tt.txt" "./tx.txt"
"./ize2.txt" "./ize3.txt" "./ize.txt"

Answer