尝试破解解决方案有点困难。
导出为 csv 的 sql 表中的一列包含一些用逗号分隔的字符串,数据类型如下。有些行只有字符串,没有任何分隔符,但其他行如下所示 -
要求通过匹配正则表达式双引号整个字符串
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1
在这里,我陷入困境
文件样本看起来像
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1,6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
以下格式是最终目标 - 完整的字符串用双引号引起来。通过正则表达式进行 Grep 并在两端添加双引号。下面的示例 - 最初尝试通过正则表达式并应用 sed,
grep -Eo "\w{10}\=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]" export.csv | sed 's/^/"/;s/$/"/'
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1"
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1"
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1"
但是,通过 grep 并使用 sed -i,它不会修改就地匹配的正则表达式 -
对于前-
tail export.csv | sed -e 's/"\w{10}\=\w{12},\w{10}=\w{3}.*,\w{10}=\d$"/"/;s/$/"/'
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1,6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"
仅在行末尾添加“最后一栏。
答案1
如果要点是在\w+=\w+
匹配,
分隔字段的序列周围加上引号,则perl
:
$ perl -lpe 's/(?<![^,])(\w+=\w+)(,(?1))*(?![^,])/"$&"/g' your-file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L2-TDD,"AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L3-TDD,"AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L1-TDD,"AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
请注意如何AntennaUnitGroup=L2-TDD
在引号之外,因为-
不是一个\w
ord 字符。替换\w
为[\w-]
或[^,"=]
以匹配除 之外的任何字符,
,"
例如=
将其包含在内。
添加-i
用于修改文件i
n-place 的选项。
答案2
使用任何 awk:
$ awk 'BEGIN{FS=OFS=","} {$12="\""$12; $19=$19"\""} 1' file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
-E
或用于启用 ERE 的sed ,例如 GNU 或 BSD sed:
$ sed -E 's/(([^,]*,){11})(([^,]*,){7}[^,]*)/\1"\3"/' file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
答案3
使用awk
:
$ awk '{ sub(/[_[:alpha:]]{10}=[_[:alpha:]]{12},[_[:alpha:]]{10}=[_[:alpha:]]{3}.*,[_[:alpha:]]{10}=[[:alnum:]]+/, "\"&\"")}1'
# Using GNU awk
$ awk '{sub(/\w{10}=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]{1,}/, "\"&\"" ) }1'
或者使用GNU awk
的gensub()
功能
$ awk '{print gensub(/(^.*)(\w{10}=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]{1,})(.*$)/, "\\1\"\\2\"\\3", "g"); }'