Bash - 按非第一个字符排序

Bash - 按非第一个字符排序

我想按第一列对文件进行排序,但必须从第 5 个字符开始排序。我该怎么做?

我的文件:

"TTTTCTTACA"            1       1
"TTTTCTTACC"                    1
"TTTTCTTACT"    1       1
"TTTTCTTAGC"    1
"TTTTCTTATT"                    2
"TTTTCTTCAA"    1               1       1
"TTTTCTTCAG"    1               2       1
"TTTTCTTCAT"            1       2       2
"TTTTCTTCCT"                            2
"TTTTCTTCGG"                    2       2
"TTTTCTTCTA"                            1
"TTTTCTTCTG"            1
"TTTTCTTCTT"    1                       2
"TTTTCTTGAA"            1
"TTTTCTTGCT"    1               1       1
"TTTTCTTTAA"    1
"TTTTCTTTAG"            1       1
"TTTTCTTTCT"    1
"TTTTCTTTGC"    1
"TTTTCTTTGG"            1       1
"TTTTCTTTGT"    1       1       2       1
"TTTTCTTTTA"    1

我正在尝试:

sort -k1,1 file | uniq -s 6 -w 5 

当然,它不起作用。也许 sort 有一些标志,但我没有找到它们。您有什么想法吗?

答案1

总结

sort -k1.5 file | uniq -s 6 -w 5


解释

我的排序是 GNU coreutils 8.22。我的排序​​的手册页显示:

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and  C
       a  character  position  in  the  field;  both are origin 1, and the stop position defaults to the
       line's end.

因此,使用当前的排序命令,sort -k1,1 file使用第一个单词到第一个单词作为排序。

您想要的是(无论如何对于排序命令):

sort -k1.5 file | uniq -s 6 -w 5

这将使用第一个单词的第五个字符,这正是您想要的。

答案2

$sort -k2 文件

"TTTTCTTCTA"                            1
"TTTTCTTCCT"                            2
"TTTTCTTACC"                    1
"TTTTCTTATT"                    2
"TTTTCTTCGG"                    2       2
"TTTTCTTCTG"            1
"TTTTCTTGAA"            1
"TTTTCTTACA"            1       1
"TTTTCTTTAG"            1       1
"TTTTCTTTGG"            1       1
"TTTTCTTCAT"            1       2       2
"TTTTCTTAGC"    1
"TTTTCTTTAA"    1
"TTTTCTTTCT"    1
"TTTTCTTTGC"    1
"TTTTCTTTTA"    1
"TTTTCTTCTT"    1                       2
"TTTTCTTCAA"    1               1       1
"TTTTCTTGCT"    1               1       1
"TTTTCTTCAG"    1               2       1
"TTTTCTTACT"    1       1
"TTTTCTTTGT"    1       1       2       1

$sort -k2 文件|uniq -f 1

"TTTTCTTCTA"                            1
"TTTTCTTCCT"                            2
"TTTTCTTACC"                    1
"TTTTCTTATT"                    2
"TTTTCTTCGG"                    2       2
"TTTTCTTCTG"            1
"TTTTCTTACA"            1       1
"TTTTCTTCAT"            1       2       2
"TTTTCTTAGC"    1
"TTTTCTTCTT"    1                       2
"TTTTCTTCAA"    1               1       1
"TTTTCTTCAG"    1               2       1
"TTTTCTTACT"    1       1
"TTTTCTTTGT"    1       1       2       1

相关内容