uniq 命令无法正常工作？

Question 1

您需要sort先使用uniq：

find . -type f -exec md5sum {} ';' | sort | uniq -w 33

uniq仅删除重复的行。它不会重新排序查找重复的行。 sort完成这一部分。

这记录在man uniq：

笔记:uniq不检测重复的行，除非它们是相邻的。您可能希望先对输入进行排序，或者使用sort -u不带的uniq。

Answer

您需要sort先使用uniq：

find . -type f -exec md5sum {} ';' | sort | uniq -w 33

uniq仅删除重复的行。它不会重新排序查找重复的行。 sort完成这一部分。

这记录在man uniq：

笔记:uniq不检测重复的行，除非它们是相邻的。您可能希望先对输入进行排序，或者使用sort -u不带的uniq。

Question 2

的输入uniq需要排序。因此，对于示例情况，

find . -type f -exec md5sum '{}' ';' | sort | uniq -w 33

会起作用。-w（--check-chars=N）使行仅针对第一列唯一；此选项适用于这种情况。但指定行的相关部分的可能性uniq有限。例如，没有选项可以指定处理某些列 3 和 5，而忽略列 4。

该命令sort本身具有唯一输出行的选项，并且这些行对于用于排序的键是唯一的。这意味着我们可以利用强大的键语法来sort定义哪些部分这些行应该是唯一的。

例如，

find . -type f -exec md5sum '{}' ';' | sort -k 1,1 -u

给出相同的结果，但该sort部分对于其他用途更加灵活。

Answer

的输入uniq需要排序。因此，对于示例情况，

find . -type f -exec md5sum '{}' ';' | sort | uniq -w 33

会起作用。-w（--check-chars=N）使行仅针对第一列唯一；此选项适用于这种情况。但指定行的相关部分的可能性uniq有限。例如，没有选项可以指定处理某些列 3 和 5，而忽略列 4。

该命令sort本身具有唯一输出行的选项，并且这些行对于用于排序的键是唯一的。这意味着我们可以利用强大的键语法来sort定义哪些部分这些行应该是唯一的。

例如，

find . -type f -exec md5sum '{}' ';' | sort -k 1,1 -u

给出相同的结果，但该sort部分对于其他用途更加灵活。

Question 3

除非输入经过缓冲（例如使用）或重复的行彼此相邻，否则该uniq命令将不起作用。因此，请寻找以下几种解决方案。sort

大王

语法如下awk：

command | awk '!a[$0]++{print}'

为了重复使用，可以定义以下shell别名：

alias unicat='awk "!a[\$0]++{print}"'

然后运行：command | unicat。

并行语法（为了更快地解析）：

command | time parallel --block 100M --pipe awk '!a[\$0]++{print}'

测试命令（从200行中筛选出100行）：

echo | pee "seq 1 100" "seq 1 100" | awk '!a[$0]++{print}' | wc -l

Perl

尝试以下 Perl 脚本：

command | perl -e 'while(<>){if(!$s{$_}){print $_;$|=1;$s{$_}=1;}}'

Shell 别名重复使用的语法：

alias unicat='perl -e '\''while(<>){if(!$s{$_}){print $_;$|=1;$s{$_}=1;}}'\'''

然后运行：command | unicat。

`unique`

作为替代方案uniq，您还可以使用用uniqueGo 编写的 karrick（它使用与上述类似的方法，但比 Perl/AWK 解决方案更快并且占用更少的内存）：

https://github.com/karrick/unique

用法：

command | unique

缺点：处理大量数据时，它会占用大量内存，因为它会对每个项目进行散列并将其添加到数组中。

解决方法是，按照特定的块大小设置短暂的实例（如果可以接受少量重复），例如

command | parallel --block 100M -j 1 --pipe unique

或者使用split（每块 10M），例如

command | split -b 10M --filter="unique"

`quniq`

使用quniq实用程序删除重复项。它比效果更好uniq，因为它可以缓冲输入行。

项目页面：https://github.com/syucream/quniq
安装方式：go install github.com/syucream/quniq@latest

缺点：它似乎卡在了管道末端。

`huniq`

如果您正在寻找更快的解决方案来排序然后删除重复项，请使用huniq。

项目页面：https://github.com/koraa/huniq
安装方式：cargo install huniq

更多类似项目：

https://github.com/mitsutoshi/uni（用 Rust 编写）
https://github.com/whitfin/runiq（用 Rust 编写）
https://github.com/miolini/uniqbloom（用 Go 编写，使用 Bloom filter）

Answer

除非输入经过缓冲（例如使用）或重复的行彼此相邻，否则该uniq命令将不起作用。因此，请寻找以下几种解决方案。sort

大王

语法如下awk：

command | awk '!a[$0]++{print}'

为了重复使用，可以定义以下shell别名：

alias unicat='awk "!a[\$0]++{print}"'

然后运行：command | unicat。

并行语法（为了更快地解析）：

command | time parallel --block 100M --pipe awk '!a[\$0]++{print}'

测试命令（从200行中筛选出100行）：

echo | pee "seq 1 100" "seq 1 100" | awk '!a[$0]++{print}' | wc -l

Perl

尝试以下 Perl 脚本：

command | perl -e 'while(<>){if(!$s{$_}){print $_;$|=1;$s{$_}=1;}}'

Shell 别名重复使用的语法：

alias unicat='perl -e '\''while(<>){if(!$s{$_}){print $_;$|=1;$s{$_}=1;}}'\'''

然后运行：command | unicat。

`unique`

作为替代方案uniq，您还可以使用用uniqueGo 编写的 karrick（它使用与上述类似的方法，但比 Perl/AWK 解决方案更快并且占用更少的内存）：

https://github.com/karrick/unique

用法：

command | unique

缺点：处理大量数据时，它会占用大量内存，因为它会对每个项目进行散列并将其添加到数组中。

解决方法是，按照特定的块大小设置短暂的实例（如果可以接受少量重复），例如

command | parallel --block 100M -j 1 --pipe unique

或者使用split（每块 10M），例如

command | split -b 10M --filter="unique"

`quniq`

使用quniq实用程序删除重复项。它比效果更好uniq，因为它可以缓冲输入行。

项目页面：https://github.com/syucream/quniq
安装方式：go install github.com/syucream/quniq@latest

缺点：它似乎卡在了管道末端。

`huniq`

如果您正在寻找更快的解决方案来排序然后删除重复项，请使用huniq。

项目页面：https://github.com/koraa/huniq
安装方式：cargo install huniq

更多类似项目：

https://github.com/mitsutoshi/uni（用 Rust 编写）
https://github.com/whitfin/runiq（用 Rust 编写）
https://github.com/miolini/uniqbloom（用 Go 编写，使用 Bloom filter）

Question 4

或者你可以安装 killdupes，我的程序可以销毁所有重复的东西！

https://github.com/batchmcnulty/killdupes

:-)

Answer

或者你可以安装 killdupes，我的程序可以销毁所有重复的东西！

https://github.com/batchmcnulty/killdupes

:-)

uniq 命令无法正常工作？

答案1

答案2

答案3

大王

Perl

`unique`

`quniq`

`huniq`

答案4

相关内容