带有多个逗号分隔符的 linux csv 列 - sed 正则表达式匹配并替换为引号

带有多个逗号分隔符的 linux csv 列 - sed 正则表达式匹配并替换为引号

尝试破解解决方案有点困难。

导出为 csv 的 sql 表中的一列包含一些用逗号分隔的字符串,数据类型如下。有些行只有字符串,没有任何分隔符,但其他行如下所示 -

要求通过匹配正则表达式双引号整个字符串
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1

在这里,我陷入困境

文件样本看起来像

export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1,6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK

以下格式是最终目标 - 完整的字符串用双引号引起来。通过正则表达式进行 Grep 并在两端添加双引号。下面的示例 - 最初尝试通过正则表达式并应用 sed,

grep -Eo "\w{10}\=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]" export.csv | sed 's/^/"/;s/$/"/'
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1"
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1"
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1"

但是,通过 grep 并使用 sed -i,它不会修改就地匹配的正则表达式 -

对于前-

tail export.csv | sed -e 's/"\w{10}\=\w{12},\w{10}=\w{3}.*,\w{10}=\d$"/"/;s/$/"/'
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1,6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"

仅在行末尾添加“最后一栏。

答案1

如果要点是在\w+=\w+匹配,分隔字段的序列周围加上引号,则perl

$ perl -lpe 's/(?<![^,])(\w+=\w+)(,(?1))*(?![^,])/"$&"/g' your-file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L2-TDD,"AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L3-TDD,"AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L1-TDD,"AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK

请注意如何AntennaUnitGroup=L2-TDD在引号之外,因为-不是一个\word 字符。替换\w[\w-][^,"=]以匹配除 之外的任何字符,"例如=将其包含在内。

添加-i用于修改文件in-place 的选项。

答案2

使用任何 awk:

$ awk 'BEGIN{FS=OFS=","} {$12="\""$12; $19=$19"\""} 1' file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK

-E或用于启用 ERE 的sed ,例如 GNU 或 BSD sed:

$ sed -E 's/(([^,]*,){11})(([^,]*,){7}[^,]*)/\1"\3"/' file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK

答案3

使用awk

$ awk '{ sub(/[_[:alpha:]]{10}=[_[:alpha:]]{12},[_[:alpha:]]{10}=[_[:alpha:]]{3}.*,[_[:alpha:]]{10}=[[:alnum:]]+/, "\"&\"")}1'

# Using GNU awk
$ awk '{sub(/\w{10}=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]{1,}/, "\"&\"" ) }1'

或者使用GNU awkgensub()功能

$ awk '{print gensub(/(^.*)(\w{10}=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]{1,})(.*$)/, "\\1\"\\2\"\\3", "g"); }'

相关内容