我在一个文件中有松散结构的记录,由 3 或 4 行文本组成,这些文本(大部分)由空行分隔。并非所有记录都有空行分隔符,但每条记录的最后一行都以单词“Added”开头。我想生成一个 csv 文件,其中每条记录占一行,前面是行号。到目前为止,我只能生成由任意数量的空格和一个冗余逗号分隔的所有记录的连接。
从逻辑上讲,我试图实现以下目标:
读取行,如果行以“Added”开头,则在末尾保留换行符,
否则将“换行符”替换为“,”
或者如果行为空白,则删除它
endif
样本数据:
Peter Green
Space Monkey at Area 51
Joined
Added by SF 3 weeks ago
Will Rossiter
Joined
Added by SF 3 weeks ago
Dean Matthews
Guitarist at Blues
Joined
Added by SF 3 weeks ago
Hobbit Mak
Farnborough, United Kingdom
Joined
Added by SF 3 weeks ago
Keneth W Moorfield
THE STOREMAN
Joined
Added by SF 3 weeks ago
Mick Georgious
Software Engineer
Joined
Added by SF 3 weeks ago
答案1
尝试:
awk '/./{ printf "%s%s", $0, (/Added/?"\n":",") }' data
使用您的示例输入数据:
$ awk '/./{printf "%s%s",$0,(/Added/?"\n":",")}' data
Peter Green,Space Monkey at Area 51,Joined,Added by SF 3 weeks ago
Will Rossiter,Joined,Added by SF 3 weeks ago
Dean Matthews,Guitarist at Blues,Joined,Added by SF 3 weeks ago
Hobbit Mak,Farnborough, United Kingdom,Joined,Added by SF 3 weeks ago
Keneth W Moorfield,THE STOREMAN,Joined,Added by SF 3 weeks ago
Mick Georgious,Software Engineer,Joined,Added by SF 3 weeks ago
怎么运行的:
/./{...}
仅当行包含字符时,才会执行花括号中的命令。换句话说,这会忽略空行。
printf "%s%s",$0,(/Added/?"\n":",")
这将打印行,表示为
$0
,后跟逗号或换行符,具体取决于该行是否与正则表达式匹配Added
。
答案2
这是一个可能的sed
解决方案(使用awk
do 进行行号):
$ sed -n -e :a -e '$!{/^$/!N}; /,Added/ {P;D}; s/\n/,/; ta' data | awk '{print NR","$0}'
1,Peter Green,Space Monkey at Area 51,Joined,Added by SF 3 weeks ago
2,Will Rossiter,Joined,Added by SF 3 weeks ago
3,Dean Matthews,Guitarist at Blues,Joined,Added by SF 3 weeks ago
4,Hobbit Mak,Farnborough, United Kingdom,Joined,Added by SF 3 weeks ago
5,Keneth W Moorfield,THE STOREMAN,Joined,Added by SF 3 weeks ago
6,Mick Georgious,Software Engineer,Joined,Added by SF 3 weeks ago
基本上,我们只是不断地添加非空的输入行,并用逗号替换它们的换行符,但我们在每次迭代时都会检查是否有完整的记录,如果有,就将其吐出。
- 设置程序标签
:a
- 如果不在文件末尾
$!
,则将非空行附加到模式空间{/^$/!N}
- 如果我们位于记录的末尾
/,Added/
,则打印它并将其从模式空间中P
删除D
- 用逗号代替换行符,成功后
s/,/\n/
返回a
答案3
值得一提的是,这里有一个perl
选项:
$ perl -lne '
push @rec, $_ unless /^$/; if (/^Added/) {print join ",", ++$n, @rec; undef @rec;}
' data
1,Peter Green,Space Monkey at Area 51,Joined,Added by SF 3 weeks ago
2,Will Rossiter,Joined,Added by SF 3 weeks ago
3,Dean Matthews,Guitarist at Blues,Joined,Added by SF 3 weeks ago
4,Hobbit Mak,Farnborough, United Kingdom,Joined,Added by SF 3 weeks ago
5,Keneth W Moorfield,THE STOREMAN,Joined,Added by SF 3 weeks ago
6,Mick Georgious,Software Engineer,Joined,Added by SF 3 weeks ago