从不一致的文本文件创建 csv

从不一致的文本文件创建 csv

我在一个文件中有松散结构的记录,由 3 或 4 行文本组成,这些文本(大部分)由空行分隔。并非所有记录都有空行分隔符,但每条记录的最后一行都以单词“Added”开头。我想生成一个 csv 文件,其中每条记录占一行,前面是行号。到目前为止,我只能生成由任意数量的空格和一个冗余逗号分隔的所有记录的连接。

从逻辑上讲,我试图实现以下目标:

读取行,如果行以“Added”开头,则在末尾保留换行符,
否则将“换行符”替换为“,”
或者如果行为空白,则删除它
endif

样本数据:

Peter Green  
Space Monkey at Area 51  
Joined  
Added by SF 3 weeks ago  
Will Rossiter  
Joined  
Added by SF 3 weeks ago

Dean Matthews  
Guitarist at Blues  
Joined  
Added by SF 3 weeks ago  
Hobbit Mak  
Farnborough, United Kingdom  
Joined  
Added by SF 3 weeks ago  

Keneth W Moorfield  
THE STOREMAN  
Joined  
Added by SF 3 weeks ago  
Mick Georgious  
Software Engineer  
Joined  
Added by SF 3 weeks ago

答案1

尝试:

awk '/./{ printf "%s%s", $0, (/Added/?"\n":",") }' data

使用您的示例输入数据:

$ awk '/./{printf "%s%s",$0,(/Added/?"\n":",")}' data
Peter Green,Space Monkey at Area 51,Joined,Added by SF 3 weeks ago
Will Rossiter,Joined,Added by SF 3 weeks ago
Dean Matthews,Guitarist at Blues,Joined,Added by SF 3 weeks ago
Hobbit Mak,Farnborough, United Kingdom,Joined,Added by SF 3 weeks ago
Keneth W Moorfield,THE STOREMAN,Joined,Added by SF 3 weeks ago
Mick Georgious,Software Engineer,Joined,Added by SF 3 weeks ago

怎么运行的:

  • /./{...}

    仅当行包含字符时,才会执行花括号中的命令。换句话说,这会忽略空行。

  • printf "%s%s",$0,(/Added/?"\n":",")

    这将打印行,表示为$0,后跟逗号或换行符,具体取决于该行是否与正则表达式匹配Added

答案2

这是一个可能的sed解决方案(使用awkdo 进行行号):

$ sed -n -e :a -e '$!{/^$/!N}; /,Added/ {P;D}; s/\n/,/; ta' data | awk '{print NR","$0}'
1,Peter Green,Space Monkey at Area 51,Joined,Added by SF 3 weeks ago
2,Will Rossiter,Joined,Added by SF 3 weeks ago
3,Dean Matthews,Guitarist at Blues,Joined,Added by SF 3 weeks ago
4,Hobbit Mak,Farnborough, United Kingdom,Joined,Added by SF 3 weeks ago
5,Keneth W Moorfield,THE STOREMAN,Joined,Added by SF 3 weeks ago
6,Mick Georgious,Software Engineer,Joined,Added by SF 3 weeks ago 

基本上,我们只是不断地添加非空的输入行,并用逗号替换它们的换行符,但我们在每次迭代时都会检查是否有完整的记录,如果有,就将其吐出。

  • 设置程序标签:a
  • 如果不在文件末尾$!,则将非空行附加到模式空间{/^$/!N}
  • 如果我们位于记录的末尾/,Added/,则打印它并将其从模式空间中P删除D
  • 用逗号代替换行符,成功后s/,/\n/返回a

答案3

值得一提的是,这里有一个perl选项:

$ perl -lne '
    push @rec, $_ unless /^$/; if (/^Added/) {print join ",", ++$n, @rec; undef @rec;}
' data
1,Peter Green,Space Monkey at Area 51,Joined,Added by SF 3 weeks ago
2,Will Rossiter,Joined,Added by SF 3 weeks ago
3,Dean Matthews,Guitarist at Blues,Joined,Added by SF 3 weeks ago
4,Hobbit Mak,Farnborough, United Kingdom,Joined,Added by SF 3 weeks ago
5,Keneth W Moorfield,THE STOREMAN,Joined,Added by SF 3 weeks ago
6,Mick Georgious,Software Engineer,Joined,Added by SF 3 weeks ago 

相关内容