使用 unix 命令获取每一行的唯一值

Question 1

由于对行进行排序比对行中的列进行排序更容易，因此一种方法可以是转置每一行（以便每个字段变成一行），应用sort然后uniq转置它们。

这是一个简单的实现，假设使用 GNU 工具：

$ while read -r line; do echo "$line" | grep -o '[^ ]*' | sort -h | uniq | paste -s; done <file

file对于每一行，它循环遍历and ：

grep使用-o选项（仅打印每行的匹配部分）将其输入拆分为n行，每个匹配子字符串一行。在这里，我们匹配除空格之外的所有内容。
分割线使用-h比较人类可读数字的选项进行排序（如果您想将字段按字母数字字符串排序，请删除-h）。
该uniq命令删除重复项。
paste -s将标准输入中的每一行打印为由制表符分隔的单行字段。您可以附加一个结尾| tr '\t' ' '将制表符更改为空格。

但请注意，使用循环来处理文本通常是被认为是不好的做法。

Answer

由于对行进行排序比对行中的列进行排序更容易，因此一种方法可以是转置每一行（以便每个字段变成一行），应用sort然后uniq转置它们。

这是一个简单的实现，假设使用 GNU 工具：

$ while read -r line; do echo "$line" | grep -o '[^ ]*' | sort -h | uniq | paste -s; done <file

file对于每一行，它循环遍历and ：

grep使用-o选项（仅打印每行的匹配部分）将其输入拆分为n行，每个匹配子字符串一行。在这里，我们匹配除空格之外的所有内容。
分割线使用-h比较人类可读数字的选项进行排序（如果您想将字段按字母数字字符串排序，请删除-h）。
该uniq命令删除重复项。
paste -s将标准输入中的每一行打印为由制表符分隔的单行字段。您可以附加一个结尾| tr '\t' ' '将制表符更改为空格。

但请注意，使用循环来处理文本通常是被认为是不好的做法。

Question 2

以下不会跨列对数据进行排序，只是提取唯一值。目前还不清楚是否需要排序。

使用awk：

$ awk '{ n=split($0,a,FS); $0=""; j=1; delete u; for (i=1; i<=n; i++) if (!u[a[i]]++) $(j++) = a[i]; print }' <file
1 2 5
1 5 3
1 5
5 2
2 4 3

该程序布局很好，带有注释：

{
    # split the current record into fields in the array a
    n = split($0, a, FS)

    # empty the current record
    $0=""

    # j is the next field number that we are to set
    # in the record that we are building
    j=1

    # seen is an associative array that we use to
    # keep track of whether we've seen a bit of
    # data before from this record
    delete seen

    # loop over the entries in a (the original
    # fields of the input data)
    for (i=1; i<=n; i++)
        # if we haven't seen this data before,
        # mark it as seen and...
        if (!seen[a[i]]++)
            # add it to the j:th field in the new record
            $(j++) = a[i]

    print
}

我在这里的想法是为每行输入构建一个输出记录，其中包含原始数据中的唯一字段。

默认情况下，“记录”与“行”同义，“字段”与“列”同义（这些只是更通用的词，取决于和中的当前值RS）FS。

Answer