我需要对 LDIF 文件进行排序,其中有几行属于父文件。
例子
dn: 2
attr1: b
attr2: a
attr1: a
attr1: c
dn: 3
attr2: a
attr1: c
attr1: b
attr1: a
dn: 1
attr1: a
attr1: c
attr1: b
attr2: a
到这个
dn: 1
attr1: a
attr1: b
attr1: c
attr2: a
dn: 2
attr1: a
attr1: b
attr1: c
attr2: a
dn: 3
attr1: a
attr1: b
attr1: c
attr2: a
因此,所有以 dn 开头的父行都被排序,下面所有 attrx 都被排序,如果 attrx 有多个值,那么也会被排序。我已经使用读取行完成了此操作,但这在大文件上需要几个小时。有没有更快的方法可以对 bash 命令执行相同的操作?
属性值始终只占一行。如果有多个值,每个值占一行。没有行进行 Base64 编码。
答案1
使用您的示例文件
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//' |awk -F'*' '{print $5"*"$1"*"$2"*"$3"*"$4}' |sort |awk -F'*' 'BEGIN{OFS="\n\n";ORS="\n\n\n"} {print $1,$2,$3,$4,$5;}'
将每个文本块转换为行并使用“*”分隔字段
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file dn: 2*attr1: b*attr2: a*attr1: a*attr1: c dn: 3*attr2: a*attr1: c*attr1: b*attr1: a dn: 1*attr1: a*attr1: c*attr1: b*attr2: a
对行内的字段进行排序并使用“*”分隔字段
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//'
attr1: a *attr1: b *attr1: c *attr2: a *dn: 2 attr1: a *attr1: b *attr1: c *attr2: a *dn: 3 attr1: a *attr1: b *attr1: c *attr2: a *dn: 1
重新排列行中的字段以将“print dn: x”放在第一位
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//' |awk -F'*' '{print $5"*"$1"*"$2"*"$3"*"$4}'
dn: 2*attr1: a *attr1: b *attr1: c *attr2: a dn: 3*attr1: a *attr1: b *attr1: c *attr2: a dn: 1*attr1: a *attr1: b *attr1: c *attr2: a
按第一列或字段对行进行排序
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//' |awk -F'*' '{print $5"*"$1"*"$2"*"$3"*"$4}' |sort
dn: 1*attr1: a *attr1: b *attr1: c *attr2: a dn: 2*attr1: a *attr1: b *attr1: c *attr2: a dn: 3*attr1: a *attr1: b *attr1: c *attr2: a
将行转换为一列并插入空行
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//' |awk -F'*' '{print $5"*"$1"*"$2"*"$3"*"$4}' |sort |awk -F'*' 'BEGIN{OFS="\n\n";ORS="\n\n\n"} {print $1,$2,$3,$4,$5;}'
dn: 1
attr1: a
attr1: b
attr1: c
attr2: a
dn: 2
attr1: a
attr1: b
attr1: c
attr2: a
dn: 3
attr1: a
attr1: b
attr1: c
attr2: a
我知道我使用了太多步骤。