我有一长串这样的数字:
1234-212-22-11153782-0114232192380
8807698823332-6756-234-14-09867378
45323-14-221-238372635363-43676256
62736373-9983-23-234-8863345637388
. . . .
. . . .
我想做两件事:
1)按每个段内的位数对该列表进行排序,输出应如下所示:
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
2)查找每行中子字符串的数量,输出应为:
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
在此示例中,每个数字的第一、第二和第三段具有相同的数字,但它们可以不同。
答案1
怎么样
$ perl -F'-' -lpe '$_ = join "-", sort { length $a <=> length $b } @F' file
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388
和
$ perl -F'-' -lpe '$_ = join "-", sort { $a <=> $b } map length, @F' file
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13
谢谢斯蒂芬·查泽拉斯对于建议的改进
答案2
GNU awk 可以排序,因此最棘手的部分是决定如何分离两个所需的输出;该脚本生成两个结果,您可以决定是否希望将它们放在硬编码输出文件之外的其他位置:
function compare_length(i1, v1, i2, v2) {
return (length(v1) - length(v2));
}
BEGIN {
PROCINFO["sorted_in"]="compare_length"
FS="-"
}
{
split($0, elements);
asort(elements, sorted_elements, "compare_length");
reordered="";
lengths="";
for (element in sorted_elements) {
reordered=(reordered == "" ? "" : reordered FS) sorted_elements[element];
lengths=(lengths == "" ? "" : lengths FS) length(sorted_elements[element]);
}
print reordered > "reordered.out";
print lengths > "lengths.out";
}
答案3
这会让你走多远:
awk -F- ' # set "-" as the field separator
{
for (i=1; i<=NF; i++){
L = length($i) # for every single field, calc its length
T[L] = $i # and populate the T array with length as index
if (L>MX){ MX = L } # keep max length
}
$0 = "" # empty line
for (i=1; i<=MX; i++){
if (T[i]){
$0 = $0 OFS T[i] # append each non-zero T element to the line, separated by "-"
C = C OFS i # keep the field lengths in separate variable C
}
}
print substr ($0, 2) "\t" substr (C, 2) # print the line and the field lengths, eliminating each first char
C = MX = "" # reset working variables
split ("", T) # delete T array
}
' OFS=- file
22-212-1234-11153782-0114232192380 2-3-4-8-13
14-234-6756-09867378-8807698823332 2-3-4-8-13
14-221-45323-43676256-238372635363 2-3-5-8-12
23-234-9983-62736373-8863345637388 2-3-4-8-13
您可能希望将打印输出拆分为两个结果文件。
答案4
使用 bash 管道,您可以编写
while IFS=- read -ra words; do
for word in "${words[@]}"; do printf "%d\t%s\n" "${#word}" "$word"; done |
sort -k1,1n |
cut -f2 |
paste -sd-
done < file