我的原始数据为 -
id=ABC name=Banana DB Connection type=FruitMarket
XYZ_1 ABC.xml
XYZ_2 ABC.xml
XYZ_3 ABC.xml
"Fruits/Mango/#Common"
"Fruits/Mango/#Bizzare"
"Fruits/Mango/#Common"
id=EFG name=FruitHouse type=jms
XYZ_4 EFG.xml
"Fruits/Plum Orange"
id=JKL name=JMSWriteConnect type=jms
XYZ_4 JKL.xml
"Fruits/Plum Orange"
id=TMZ name=Banana DB Connection type=FruitMarket
XYZ_5 TMZ.xml
"Fruits/Mango/Backup/Apple"
id=LDL name=Banana DB Market-Connect type=FruitMarket
XYZ_6 LDL.xml
XYZ_7 LDL.xml
XYZ_8 LDL.xml
XYZ_9 LDL.xml
XYZ_10 LDL.xml
XYZ_11 LDL.xml
"Fruits/Mango/#Common"
"Fruits/Mango/#Common"
"VEG/Mango/#NOT"
"Fruits/Mango/#Common"
"Fruits/Mango/#NOT"
"Fruits/Mango/#Common"
使用 shell 脚本(awk、sed、bash),我想对齐为(最终输出)-
id=ABC name=Banana DB Connection type=FruitMarket
XYZ_1 ABC.xml "Fruits/Mango/#Common"
XYZ_2 ABC.xml "Fruits/Mango/#Bizzare"
XYZ_3 ABC.xml "Fruits/Mango/#Common"
id=EFG name=FruitHouse type=jms
XYZ_4 EFG.xml "Fruits/Plum Orange"
id=JKL name=JMSWriteConnect type=jms
XYZ_4 JKL.xml "Fruits/Plum Orange"
id=TMZ name=Banana DB Connection type=FruitMarket
XYZ_5 TMZ.xml "Fruits/Mango/Backup/Apple"
id=LDL name=Banana DB Market-Connect type=FruitMarket
XYZ_6 LDL.xml "Fruits/Mango/#Common"
XYZ_7 LDL.xml "Fruits/Mango/#Common"
XYZ_8 LDL.xml "VEG/Mango/#NOT"
XYZ_9 LDL.xml "Fruits/Mango/#Common"
XYZ_10 LDL.xml "Fruits/Mango/#NOT"
XYZ_11 LDL.xml "Fruits/Mango/#Common"
行中的空格并不重要。任何线索都会有所帮助。
答案1
假设每条记录总是只有一个标题行(id/name/type),并且记录主体由相等数量的 XYZ_n LDL.xml 行和类别(水果/蔬菜)行组成,您可以使用 GNU awk ( gawk
) 在段落模式下,获取行/变量/协进程pr
与两列分页命令进行通信:
BEGIN {
RS = ""; FS = "\n";
cmd = "pr -T -s -2"
}
{
print $1;
for(i=2;i<=NF;i++)
print $i |& cmd;
close(cmd,"to");
while((cmd |& getline line) > 0)
print line;
close(cmd);
print ""
}
' file
id=ABC name=Banana DB Connection type=FruitMarket
XYZ_1 ABC.xml "Fruits/Mango/#Common"
XYZ_2 ABC.xml "Fruits/Mango/#Bizzare"
XYZ_3 ABC.xml "Fruits/Mango/#Common"
id=EFG name=FruitHouse type=jms
XYZ_4 EFG.xml "Fruits/Plum Orange"
id=JKL name=JMSWriteConnect type=jms
XYZ_4 JKL.xml "Fruits/Plum Orange"
id=TMZ name=Banana DB Connection type=FruitMarket
XYZ_5 TMZ.xml "Fruits/Mango/Backup/Apple"
id=LDL name=Banana DB Market-Connect type=FruitMarket
XYZ_6 LDL.xml "Fruits/Mango/#Common"
XYZ_7 LDL.xml "Fruits/Mango/#Common"
XYZ_8 LDL.xml "VEG/Mango/#NOT"
XYZ_9 LDL.xml "Fruits/Mango/#Common"
XYZ_10 LDL.xml "Fruits/Mango/#NOT"
XYZ_11 LDL.xml "Fruits/Mango/#Common"
答案2
珀尔:
perl -00 -F'\n' -anE '
$n = ($#F + 1)/2;
say $F[0];
say $F[$_], $F[$_+$n] for (1..$n);
say "";
' raw
-00
按段落分割文件-F'\n'
使用换行符作为字段分隔符-a
将记录“自动拆分”为存储在 @F 数组中的字段-n
循环遍历文件中的记录
id=ABC name=Banana DB Connection type=FruitMarket
XYZ_1 ABC.xml "Fruits/Mango/#Common"
XYZ_2 ABC.xml "Fruits/Mango/#Bizzare"
XYZ_3 ABC.xml "Fruits/Mango/#Common"
id=EFG name=FruitHouse type=jms
XYZ_4 EFG.xml "Fruits/Plum Orange"
id=JKL name=JMSWriteConnect type=jms
XYZ_4 JKL.xml "Fruits/Plum Orange"
id=TMZ name=Banana DB Connection type=FruitMarket
XYZ_5 TMZ.xml "Fruits/Mango/Backup/Apple"
id=LDL name=Banana DB Market-Connect type=FruitMarket
XYZ_6 LDL.xml "Fruits/Mango/#Common"
XYZ_7 LDL.xml "Fruits/Mango/#Common"
XYZ_8 LDL.xml "VEG/Mango/#NOT"
XYZ_9 LDL.xml "Fruits/Mango/#Common"
XYZ_10 LDL.xml "Fruits/Mango/#NOT"
XYZ_11 LDL.xml "Fruits/Mango/#Common"
答案3
使用paste
,grep
和sed
:
paste -d ' '\
<(grep -v '"' file)\
<(grep -v '\.xml' file | sed 's/^[[:blank:]]*//;s/id=.*//')
第一个grep
获取所有不带双引号的行。这些是空行,包含 ID 和 XML 文件名的行。第二个grep
获取不包含 XML 文件名的所有行。前导空格字符/制表符和以 开头的字符串id=
将被删除。paste
使用空格字符作为分隔符将两个结果组合在一起。
答案4
一个awk
版本
awk -v RS="" -v FS="\n" '{print $1; for (i=2; i<=((NF+1)/2); i+=1)
{print $i, $((NF+1)/2+i-1)}; print "\n"}' file