sed

Question

长话短说

在 ksh、bash、zsh 中：

sed -e $'s,"title":,\1,g' -e $'s,"url":,\2,g' -e $'s,^[^\1]*,,' -e $'
         s,\1\\([^\2]*\\)\2[^\1]*,\\1\\\n,g' infile

sed

一个字符分隔符。

规范的解决方案一个字符@假设分隔符#是：

sed 's,^[^@]*,,;s,@\([^#]*\)#[^@]*,\1 ,g' infile

这将 - 从开头删除不是 a 的每个字符@ - 提取之间的字符第一的 @ 到下一个第一的 #接下来。

对于每个线输入文件的infile.

通用分隔符。

任何其他分隔符都可以通过简单地将每个分隔符字符串转换为上面的答案来转换一特点。

sed -e 's,"title":,@,g' -e 's,"url":,#,g' -e 's/^[^@]*//;s/@\([^#]*\)#[^@]*/\1 /g' infile

在您的情况下，您可以使用换行符来代替空格 ( \1) ，为 GNU sed 编写的换行符很简单 ( \1\n)：

sed -e 's,"title":,@,g' -e 's,"url":,#,g' -e 's/^[^@]*//;s/@\([^#]*\)#[^@]*/\1\n/g' infile

对于其他（较旧的）sed 添加显式换行符：

sed -e 's,"title":,@,g' -e 's,"url":,#,g' -e 's/^[^@]*//;s/@\([^#]*\)#[^@]*/\1\
/g' infile

如果存在上面使用的分隔符可能位于文件内部的风险，请选择其他不存在于文件内部的分隔符。如果这似乎是一个问题，则开始和结束分隔符可以是控制字符，例如Ctrl- A（或编码：^A、十六进制：Ox01或八进制\001）。您可以通过键入Ctrl- V Ctrl-在 shell 控制台中输入该内容A。您将在命令行中看到^A：

sed -e 's,"title":,^A,g' -e 's,"url":,^B,g' -e 's,^[^^A]*,,;s,^A\([^^B]*\)^B[^^A]*,\1\n,g' infile

或者，如果输入太麻烦，可以使用 (ksh,bash,zsh)：

sed -e $'s,"title":,\1,g' -e $'s,"url":,\2,g' -e $'s,^[^\1]*,,' -e $'s,\1\\([^\2]*\\)\2[^\1]*,\\1\\\n,g' infile

或者，如果您的 sed 支持它：

sed -e 's,"title":,\o001,g' -e 's,"url":,\o002,g' -e 's,^[^\o001]*,,' -e 's,\o001\([^\o002]*\)\o002[^\o001]*,\1\o012,g' infile

如果分隔符是“描述”：

如果起始标签实际上是"description":（来自您的输出示例），只需使用它而不是"title":

上面的输出（来自您之前在问题中链接的文件）：

"Black Friday deal: Palm companion phone is $150 off at Verizon, but there's a catch","description":"",
"LG trademarks potential names for its foldable phone, one fits a crazy concept found in patents","description":"",
"Blackview's Black Friday promo discounts the BV9500 Pro and other rugged phones on Amazon","description":"Advertorial by Blackview: the opinions expressed in this story may not reflect the positions of PhoneArena! disclaimer   amzn_assoc_tracking_id = 'phone0e0d-20';amzn_assoc_ad_mode = 'manual';amzn_assoc_ad_type ...",

如果您需要对行进行编号，请再次使用 sed sed -n '=;p;g;p'：

| sed -n '=;p;g;p'
1
"Black Friday deal: Palm companion phone is $150 off at Verizon, but there's a catch","description":"",

2
"LG trademarks potential names for its foldable phone, one fits a crazy concept found in patents","description":"",

3
"Blackview's Black Friday promo discounts the BV9500 Pro and other rugged phones on Amazon","description":"Advertorial by Blackview: the opinions expressed in this story may not reflect the positions of PhoneArena! disclaimer   amzn_assoc_tracking_id = 'phone0e0d-20';amzn_assoc_ad_mode = 'manual';amzn_assoc_ad_type ...",

AWK

在awk中实现的类似逻辑：

awk -vone=$'\1' -vtwo=$'\2' '{
            gsub(/"title":/,one);
            gsub(/"url":/,two);
            sub("^[^"one"]*"one,"")
            gsub(two"[^"one"]*"one,ORS)
            sub(two"[^"two"]*$","")
           } 1' infile

Answer 1

长话短说

在 ksh、bash、zsh 中：

sed -e $'s,"title":,\1,g' -e $'s,"url":,\2,g' -e $'s,^[^\1]*,,' -e $'
         s,\1\\([^\2]*\\)\2[^\1]*,\\1\\\n,g' infile

sed

一个字符分隔符。

规范的解决方案一个字符@假设分隔符#是：

sed 's,^[^@]*,,;s,@\([^#]*\)#[^@]*,\1 ,g' infile

这将 - 从开头删除不是 a 的每个字符@ - 提取之间的字符第一的 @ 到下一个第一的 #接下来。

对于每个线输入文件的infile.

通用分隔符。

任何其他分隔符都可以通过简单地将每个分隔符字符串转换为上面的答案来转换一特点。

sed -e 's,"title":,@,g' -e 's,"url":,#,g' -e 's/^[^@]*//;s/@\([^#]*\)#[^@]*/\1 /g' infile

在您的情况下，您可以使用换行符来代替空格 ( \1) ，为 GNU sed 编写的换行符很简单 ( \1\n)：

sed -e 's,"title":,@,g' -e 's,"url":,#,g' -e 's/^[^@]*//;s/@\([^#]*\)#[^@]*/\1\n/g' infile

对于其他（较旧的）sed 添加显式换行符：

sed -e 's,"title":,@,g' -e 's,"url":,#,g' -e 's/^[^@]*//;s/@\([^#]*\)#[^@]*/\1\
/g' infile

如果存在上面使用的分隔符可能位于文件内部的风险，请选择其他不存在于文件内部的分隔符。如果这似乎是一个问题，则开始和结束分隔符可以是控制字符，例如Ctrl- A（或编码：^A、十六进制：Ox01或八进制\001）。您可以通过键入Ctrl- V Ctrl-在 shell 控制台中输入该内容A。您将在命令行中看到^A：

sed -e 's,"title":,^A,g' -e 's,"url":,^B,g' -e 's,^[^^A]*,,;s,^A\([^^B]*\)^B[^^A]*,\1\n,g' infile

或者，如果输入太麻烦，可以使用 (ksh,bash,zsh)：

sed -e $'s,"title":,\1,g' -e $'s,"url":,\2,g' -e $'s,^[^\1]*,,' -e $'s,\1\\([^\2]*\\)\2[^\1]*,\\1\\\n,g' infile

或者，如果您的 sed 支持它：

sed -e 's,"title":,\o001,g' -e 's,"url":,\o002,g' -e 's,^[^\o001]*,,' -e 's,\o001\([^\o002]*\)\o002[^\o001]*,\1\o012,g' infile

如果分隔符是“描述”：

如果起始标签实际上是"description":（来自您的输出示例），只需使用它而不是"title":

上面的输出（来自您之前在问题中链接的文件）：

"Black Friday deal: Palm companion phone is $150 off at Verizon, but there's a catch","description":"",
"LG trademarks potential names for its foldable phone, one fits a crazy concept found in patents","description":"",
"Blackview's Black Friday promo discounts the BV9500 Pro and other rugged phones on Amazon","description":"Advertorial by Blackview: the opinions expressed in this story may not reflect the positions of PhoneArena! disclaimer   amzn_assoc_tracking_id = 'phone0e0d-20';amzn_assoc_ad_mode = 'manual';amzn_assoc_ad_type ...",

如果您需要对行进行编号，请再次使用 sed sed -n '=;p;g;p'：

| sed -n '=;p;g;p'
1
"Black Friday deal: Palm companion phone is $150 off at Verizon, but there's a catch","description":"",

2
"LG trademarks potential names for its foldable phone, one fits a crazy concept found in patents","description":"",

3
"Blackview's Black Friday promo discounts the BV9500 Pro and other rugged phones on Amazon","description":"Advertorial by Blackview: the opinions expressed in this story may not reflect the positions of PhoneArena! disclaimer   amzn_assoc_tracking_id = 'phone0e0d-20';amzn_assoc_ad_mode = 'manual';amzn_assoc_ad_type ...",

AWK

在awk中实现的类似逻辑：

awk -vone=$'\1' -vtwo=$'\2' '{
            gsub(/"title":/,one);
            gsub(/"url":/,two);
            sub("^[^"one"]*"one,"")
            gsub(two"[^"one"]*"one,ORS)
            sub(two"[^"two"]*$","")
           } 1' infile

sed

答案1

长话短说

sed

一个字符分隔符。

通用分隔符。

如果分隔符是“描述”：

AWK

相关内容