所有级别均应按字母顺序排序(但必须与其父级保存在一起)
文件示例:
first
apple
orange
train
car
kiwi
third
orange
apple
plane
second
lemon
预期结果:
first
apple
kiwi
orange
car
train
second
lemon
third
apple
plane
orange
已使用以下命令,但仅当文件在树中只有两层时才有效。
sed '/^[^[:blank:]]/h;//!G;s/\(.*\)\n\(.*\)/\2\x02\1/' infile | sort | sed 's/.*\x02//'
我怎样才能正确排序所有级别?
提前致谢
答案1
扩展Python
解决方案:
样本infile
内容(4级):
first
apple
orange
train
car
truck
automobile
kiwi
third
orange
apple
plane
second
lemon
sort_hierarchy.py
脚本:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import re
with open(sys.argv[1], 'rt') as f:
pat = re.compile(r'^\s+')
paths = []
for line in f:
offset = pat.match(line)
item = line.strip()
if not offset:
offset = 0
paths.append(item)
else:
offset = offset.span()[1]
if offset > prev_offset:
paths.append(paths[-1] + '.' + item)
else:
cut_pos = -prev_offset//offset
paths.append('.'.join(paths[-1].split('.')[:cut_pos]) + '.' + item)
prev_offset = offset
paths.sort()
sub_pat = re.compile(r'[^.]+\.')
for i in paths:
print(sub_pat.sub(' ' * 4, i))
用法:
python sort_hierarchy.py path/to/infile
输出:
first
apple
kiwi
orange
car
automobile
truck
train
second
lemon
third
apple
plane
orange
答案2
Awk
解决方案:
样本infile
内容(4级):
first
apple
orange
train
car
truck
automobile
kiwi
third
orange
apple
plane
second
lemon
awk '{
offset = gsub(/ /, "");
if (offset == 0) { items[NR] = $1 }
else if (offset > prev_ofst) { items[NR] = items[NR-1] "." $1 }
else {
prev_item = items[NR-1];
gsub("(\\.[^.]+){" int(prev_ofst / offset) "}$", "", prev_item);
items[NR] = prev_item "." $1
}
prev_ofst = offset;
}
END{
asort(items);
for (i = 1; i <= NR; i++) {
gsub(/[^.]+\./, " ", items[i]);
print items[i]
}
}' infile
输出:
first
apple
kiwi
orange
car
automobile
truck
train
second
lemon
third
apple
plane
orange
答案3
适用于任何深度
#!/usr/bin/python3
lines = open('test_file').read().splitlines()
def yield_sorted_lines(lines):
sorter = []
for l in lines:
fields = l.split('\t')
n = len(fields)
sorter = sorter[:n-1] + fields[n-1:]
yield sorter, l
prefixed_lines = yield_sorted_lines(lines)
sorted_lines = sorted(prefixed_lines, key=lambda x: x[0])
for x, y in sorted_lines:
print(y)
或者管道
awk -F'\\t' '{a[NF]=$NF; for (i=1; i<=NF; ++i) printf "%s%s", a[i], i==NF? "\n": "\t"}' file|
sort | awk -F'\\t' -vOFS='\t' '{for (i=1; i<NF; ++i) $i=""; print}'
答案4
sed '/^ /{H;$!d};x;1d;s/\n/\x7/g' | sort | tr \\a \\n
/continuation/{H;$!d};x;1d
(或等)是一个/firstline/!
slurp,只有当缓冲区中有完整的行时它才会掉下来。
如果最后可能会出现单线聚集,请添加${p;x;/\n/d}
所需的双泵。