仅替换数据文件中的某些双引号

Question 1

仅使用正则表达式这有点棘手，但可以通过几个步骤完成。这是我为此使用的 perl 脚本（无法使用 sed，因为我使用了前瞻）：

perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;s/~~(.*?)~~/~~"\1"~~/g;s/~~~~/|/g;s/~~//g' inputfile.txt

（perl -pi -e如果您想编辑输入文件，请使用）

该脚本执行以下步骤：

查找 |"{...}"|、（行首）"{...}"| 或 |"{...}"（行尾）内的所有内容，忽略外部的空格的文本。将外部位替换为~~（我使用了已知不在文本内部的内容）
将所有剩余的引号替换为双引号
将所有内部~~{...}~~序列替换为~~"{...}"~~
将所有~~~~序列（全部是内部序列）替换为|
删除所有剩余的~~序列（位于行的开头和结尾）

运行每个步骤并给出以下测试文本：

"164829" | "collection 1" | "wood plank 2" x 4" long" | "23.5"
"939485"|"collect "name""|"more items with | " and ""|"294.5""

每一步之后我们都会得到以下输出：

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;' testinput.txt                       
~~164829~~~~collection 1~~~~wood plank 2" x 4" long~~~~23.5~~
~~939485~~~~collect "name"~~~~more items with | " and "~~~~294.5"~~

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;' testinput.txt
~~164829~~~~collection 1~~~~wood plank 2"" x 4"" long~~~~23.5~~
~~939485~~~~collect ""name""~~~~more items with | "" and ""~~~~294.5""~~

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;s/~~(.*?)~~/~~"\1"~~/g;' testinput.txt
~~"164829"~~~~"collection 1"~~~~"wood plank 2"" x 4"" long"~~~~"23.5"~~
~~"939485"~~~~"collect ""name"""~~~~"more items with | "" and """~~~~"294.5"""~~

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;s/~~(.*?)~~/~~"\1"~~/g;s/~~~~/|/g;' testinput.txt
~~"164829"|"collection 1"|"wood plank 2"" x 4"" long"|"23.5"~~
~~"939485"|"collect ""name"""|"more items with | "" and """|"294.5"""~~

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;s/~~(.*?)~~/~~"\1"~~/g;s/~~~~/|/g;s/~~//g' testpipe.txt
"164829"|"collection 1"|"wood plank 2"" x 4"" long"|"23.5"
"939485"|"collect ""name"""|"more items with | "" and """|"294.5"""

Answer

仅使用正则表达式这有点棘手，但可以通过几个步骤完成。这是我为此使用的 perl 脚本（无法使用 sed，因为我使用了前瞻）：

perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;s/~~(.*?)~~/~~"\1"~~/g;s/~~~~/|/g;s/~~//g' inputfile.txt

（perl -pi -e如果您想编辑输入文件，请使用）

该脚本执行以下步骤：

查找 |"{...}"|、（行首）"{...}"| 或 |"{...}"（行尾）内的所有内容，忽略外部的空格的文本。将外部位替换为~~（我使用了已知不在文本内部的内容）
将所有剩余的引号替换为双引号
将所有内部~~{...}~~序列替换为~~"{...}"~~
将所有~~~~序列（全部是内部序列）替换为|
删除所有剩余的~~序列（位于行的开头和结尾）

运行每个步骤并给出以下测试文本：

"164829" | "collection 1" | "wood plank 2" x 4" long" | "23.5"
"939485"|"collect "name""|"more items with | " and ""|"294.5""

每一步之后我们都会得到以下输出：

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;' testinput.txt                       
~~164829~~~~collection 1~~~~wood plank 2" x 4" long~~~~23.5~~
~~939485~~~~collect "name"~~~~more items with | " and "~~~~294.5"~~

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;' testinput.txt
~~164829~~~~collection 1~~~~wood plank 2"" x 4"" long~~~~23.5~~
~~939485~~~~collect ""name""~~~~more items with | "" and ""~~~~294.5""~~

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;s/~~(.*?)~~/~~"\1"~~/g;' testinput.txt
~~"164829"~~~~"collection 1"~~~~"wood plank 2"" x 4"" long"~~~~"23.5"~~
~~"939485"~~~~"collect ""name"""~~~~"more items with | "" and """~~~~"294.5"""~~

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;s/~~(.*?)~~/~~"\1"~~/g;s/~~~~/|/g;' testinput.txt
~~"164829"|"collection 1"|"wood plank 2"" x 4"" long"|"23.5"~~
~~"939485"|"collect ""name"""|"more items with | "" and """|"294.5"""~~

$ perl -pe 's/(?:(?:\||^)[ ]*"(.*?)\"[ ]*(?=\||$))/~~\1~~/gm;s/"/""/g;s/~~(.*?)~~/~~"\1"~~/g;s/~~~~/|/g;s/~~//g' testpipe.txt
"164829"|"collection 1"|"wood plank 2"" x 4"" long"|"23.5"
"939485"|"collect ""name"""|"more items with | "" and """|"294.5"""

Question 2

你的双引号不是真正的双引号（“ vs "）。
使用真正的双引号 " ，你可以尝试这个 sed （假设你的数据中没有@）

sed 's/" | "/@/g;s/"/""/g;s/^"//;s/"$//;s/@/" | "/g' infile

Answer

你的双引号不是真正的双引号（“ vs "）。
使用真正的双引号 " ，你可以尝试这个 sed （假设你的数据中没有@）

sed 's/" | "/@/g;s/"/""/g;s/^"//;s/"$//;s/@/" | "/g' infile

Question 3

awk -v FS='|' -v OFS='|' '{for(i=1;i<=NF;i++){gsub(/"/,"\"\"",$i);sub(/"/,"",$i);sub(/"[^"]*$/,"",$i)}print}' myfile > myfile3

Answer

awk -v FS='|' -v OFS='|' '{for(i=1;i<=NF;i++){gsub(/"/,"\"\"",$i);sub(/"/,"",$i);sub(/"[^"]*$/,"",$i)}print}' myfile > myfile3

Question 4

输入 -

“2017”|“S”|“221318”|“我们”|“20170118”|“某人的名字”|“20170215”|“1785”|“009”|“20170215”|“182339”|“99536”|“ 00090"|"本地00"|"930N"|"2017"|"6100"|"0000880"|1.000|0.000|"EA"|""|""|""|""|"005"|"00000000" |" "|" "|"1785"|"50228"|"R"|"2017"|"NMT 注意| 5" X 3" NAT ON BLK"|" "|" "|"USD"|"7444" |" "|"000"|"硬币"|"04"|35.00|"00"

命令：

  awk -v RS='[[:blank:]]*[[:blank:]]*[|][[:blank:]]*|[[:blank:]]*[\n][[:blank:]]*' '{ if ($0 !~ /(^"([^"]|"")*"$)/) { gsub(/\"/,"\"\"");sub(/^"/,"");sub(/"$/,"") } printf "%s%s", $0, RT}' file.txt

输出 -

“2017”|“S”|“221318”|“我们”|“20170118”|“某人的名字”|“20170215”|“1785”|“009”|“20170215”|“182339”|“99536”|“ 00090"|"本地00"|"930N"|"2017"|"6100"|"0000880"|1.000|0.000|"EA"|""|""|""|""|"005"|"00000000" |" "|" "|"1785"|"50228"|"R"|"2017"|"NMT 注意| 5"" X 3"" NAT ON BLK"|" "|" "|"USD"|" 7444"|""|"000"|"硬币"|"04"|35.00|"00"

Answer

输入 -

“2017”|“S”|“221318”|“我们”|“20170118”|“某人的名字”|“20170215”|“1785”|“009”|“20170215”|“182339”|“99536”|“ 00090"|"本地00"|"930N"|"2017"|"6100"|"0000880"|1.000|0.000|"EA"|""|""|""|""|"005"|"00000000" |" "|" "|"1785"|"50228"|"R"|"2017"|"NMT 注意| 5" X 3" NAT ON BLK"|" "|" "|"USD"|"7444" |" "|"000"|"硬币"|"04"|35.00|"00"

命令：

  awk -v RS='[[:blank:]]*[[:blank:]]*[|][[:blank:]]*|[[:blank:]]*[\n][[:blank:]]*' '{ if ($0 !~ /(^"([^"]|"")*"$)/) { gsub(/\"/,"\"\"");sub(/^"/,"");sub(/"$/,"") } printf "%s%s", $0, RT}' file.txt

输出 -

“2017”|“S”|“221318”|“我们”|“20170118”|“某人的名字”|“20170215”|“1785”|“009”|“20170215”|“182339”|“99536”|“ 00090"|"本地00"|"930N"|"2017"|"6100"|"0000880"|1.000|0.000|"EA"|""|""|""|""|"005"|"00000000" |" "|" "|"1785"|"50228"|"R"|"2017"|"NMT 注意| 5"" X 3"" NAT ON BLK"|" "|" "|"USD"|" 7444"|""|"000"|"硬币"|"04"|35.00|"00"

仅替换数据文件中的某些双引号

答案1

答案2

答案3

答案4

相关内容