我有一个来自 MapReduce 输出的文件,其格式如下,需要使用 shell script 转换为 CSV。动态值是五个值,它们是交易 ID 和之后的四个字段(2000,ABC corp,..,BE900000075000027),并且它们会针对下一个交易 ID 不断变化,只有其他 17 个值(25-MAY-15) ,04:20 ...直到标准寿命)是恒定的。
25-MAY-15
04:20
Client
0000000010
127.0.0.1
PAY
ISO20022
PAIN000
100
1
CUST
API
ABF07
ABC03_LIFE.xml
AFF07/LIFE
100000
Standard Life
================================================
================================================
AFF07-B000001
2000
ABC Corp
..
BE900000075000027
AFF07-B000002
2000
XYZ corp
..
BE900000075000027
AFF07-B000003
2000
3MM corp
..
BE900000075000027
需要以下格式的输出
25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,AFF07-B000001, 2000,ABC Corp,..,BE900000075000027
25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,AFF07-B000002,2000,XYZ Corp,..,BE900000075000027
我需要在两条虚线之前重复值以及事务 ID AFF07-B000001、AFF07-B000002、AFF07-B000003 的其余输出。实际文件中没有虚线,我已添加它以确保更好地理解输入文件
答案1
假设有 5 个字段要重复,请使用以下 awk
BEGIN { header=1 ; }
length($0) == 0 { header=0 }
length($0) > 0 {
if ( header ) { str_h= str_h "," $0 ;}
else {
str_f = str_f "," $0 ;
c++ ;
if ( c == 5 ) {
printf "%s%s\n",substr(str_h,2),str_f ;
c = 0 ;
str_f = "" ;
}
}
}