如何提取列中的唯一值

Question

这应该可以用 5 行代码完成您所需要的工作（其中 2 行只是整理）：

#!/bin/bash
# run this, specifying input file as $1 (parameter 1)

# delete any pre-existing column files from /tmp
find /tmp -maxdepth 1 -name "column*" -delete

# create /tmp/columnN files - each file holds one column of $1
awk '{for (f=1; f<=NF; f++) {print $f >>"/tmp/column"f}}' "$1"

# iterate through column files, sorting and removing duplicates
find /tmp -maxdepth 1 -name "column*" -execdir sort -o \{\} -u \{\} \;

# re-combine columns and output to stdout
paste /tmp/column*

# delete column files from /tmp
find /tmp -maxdepth 1 -name "column*" -delete

对于大量列（如您所拥有的），粘贴命令可能会失败，因为 /tmp/column* 无法完全展开。

输出与您的示例的不同之处在于，每列的输出都已排序，而在原始示例中，第二列未排序。

Answer 1

这应该可以用 5 行代码完成您所需要的工作（其中 2 行只是整理）：

#!/bin/bash
# run this, specifying input file as $1 (parameter 1)

# delete any pre-existing column files from /tmp
find /tmp -maxdepth 1 -name "column*" -delete

# create /tmp/columnN files - each file holds one column of $1
awk '{for (f=1; f<=NF; f++) {print $f >>"/tmp/column"f}}' "$1"

# iterate through column files, sorting and removing duplicates
find /tmp -maxdepth 1 -name "column*" -execdir sort -o \{\} -u \{\} \;

# re-combine columns and output to stdout
paste /tmp/column*

# delete column files from /tmp
find /tmp -maxdepth 1 -name "column*" -delete

对于大量列（如您所拥有的），粘贴命令可能会失败，因为 /tmp/column* 无法完全展开。

输出与您的示例的不同之处在于，每列的输出都已排序，而在原始示例中，第二列未排序。

如何提取列中的唯一值

答案1

相关内容