我想按第一列对文件进行排序,但必须从第 5 个字符开始排序。我该怎么做?
我的文件:
"TTTTCTTACA" 1 1
"TTTTCTTACC" 1
"TTTTCTTACT" 1 1
"TTTTCTTAGC" 1
"TTTTCTTATT" 2
"TTTTCTTCAA" 1 1 1
"TTTTCTTCAG" 1 2 1
"TTTTCTTCAT" 1 2 2
"TTTTCTTCCT" 2
"TTTTCTTCGG" 2 2
"TTTTCTTCTA" 1
"TTTTCTTCTG" 1
"TTTTCTTCTT" 1 2
"TTTTCTTGAA" 1
"TTTTCTTGCT" 1 1 1
"TTTTCTTTAA" 1
"TTTTCTTTAG" 1 1
"TTTTCTTTCT" 1
"TTTTCTTTGC" 1
"TTTTCTTTGG" 1 1
"TTTTCTTTGT" 1 1 2 1
"TTTTCTTTTA" 1
我正在尝试:
sort -k1,1 file | uniq -s 6 -w 5
当然,它不起作用。也许 sort 有一些标志,但我没有找到它们。您有什么想法吗?
答案1
总结
sort -k1.5 file | uniq -s 6 -w 5
解释
我的排序是 GNU coreutils 8.22。我的排序的手册页显示:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C
a character position in the field; both are origin 1, and the stop position defaults to the
line's end.
因此,使用当前的排序命令,sort -k1,1 file
使用第一个单词到第一个单词作为排序。
您想要的是(无论如何对于排序命令):
sort -k1.5 file | uniq -s 6 -w 5
这将使用第一个单词的第五个字符,这正是您想要的。
答案2
$sort -k2 文件
"TTTTCTTCTA" 1
"TTTTCTTCCT" 2
"TTTTCTTACC" 1
"TTTTCTTATT" 2
"TTTTCTTCGG" 2 2
"TTTTCTTCTG" 1
"TTTTCTTGAA" 1
"TTTTCTTACA" 1 1
"TTTTCTTTAG" 1 1
"TTTTCTTTGG" 1 1
"TTTTCTTCAT" 1 2 2
"TTTTCTTAGC" 1
"TTTTCTTTAA" 1
"TTTTCTTTCT" 1
"TTTTCTTTGC" 1
"TTTTCTTTTA" 1
"TTTTCTTCTT" 1 2
"TTTTCTTCAA" 1 1 1
"TTTTCTTGCT" 1 1 1
"TTTTCTTCAG" 1 2 1
"TTTTCTTACT" 1 1
"TTTTCTTTGT" 1 1 2 1
$sort -k2 文件|uniq -f 1
"TTTTCTTCTA" 1
"TTTTCTTCCT" 2
"TTTTCTTACC" 1
"TTTTCTTATT" 2
"TTTTCTTCGG" 2 2
"TTTTCTTCTG" 1
"TTTTCTTACA" 1 1
"TTTTCTTCAT" 1 2 2
"TTTTCTTAGC" 1
"TTTTCTTCTT" 1 2
"TTTTCTTCAA" 1 1 1
"TTTTCTTCAG" 1 2 1
"TTTTCTTACT" 1 1
"TTTTCTTTGT" 1 1 2 1