获取带换行符的字符串的第一行

Question

您不需要sed在管道中运行多次。 sed可以采用多个-e选项，每个选项都有一个声明。您还可以仅使用一个-e选项，并用分号分隔语句;。或者甚至是多个-e选项，;每个选项中有多个 - 分隔的语句。

您的 sed 命令最好写为：

sed -E -e 's/^[ ]?[0-9]* //g; s/^“[ ]?[0-9]?[ ]?//g; s/”$//g; s/^(Excerpt From).*//g'

请注意，我^在第二个语句中添加了将正则表达式模式“锚定”到行的开头，类似于第三个语句使用锚定到行的末尾$。

然而，sed 不能很好地处理多行字符串。

然而，Perl 确实如此，并且通过其-p选项，它可以替代 sed（至少对于sed像这样的简单脚本 - 更复杂的 sed 脚本最好完全重写为 perl 脚本）：

$ cat /tmp/book.txt 
“When there is no data to guide intuition, scientists impose a “compatibility” criterion: any new theory attempting to extrapolate beyond tested ground should, in the proper limit, reproduce current knowledge.”

Excerpt From
The Island of Knowledge
Marcelo Gleiser
This material may be protected by copyright.

$ perl -0777 -p -e 's/^[ ]?[0-9]* //msg;
                    s/^“[ ]?[0-9]?[ ]?//msg;
                    s/”$//msg;
                    s/^(Excerpt From).*//msg;
                    s/^\s*$//msg' /tmp/book.txt 
When there is no data to guide intuition, scientists impose a “compatibility” criterion: any new theory attempting to extrapolate beyond tested ground should, in the proper limit, reproduce current knowledge.

分号后面的空格（在我的 sed 示例中）和换行符（在 perl 示例中）是可选的。它们只是为了提高可读性，对 sed 和 perl 脚本的运行方式没有影响。
这添加了另一个语句s/^\s*$//msg来删除空行。

如果要将剩余的“智能”引号转换为正常的双引号字符，请s/“|”/"/g;在该s/^\s*$//msg语句之前添加另一条语句。那么输出将是：

When there is no data to guide intuition, scientists impose a "compatibility" criterion: any new theory attempting to extrapolate beyond tested ground should, in the proper limit, reproduce current knowledge.

这些s///语句可能可以优化，但没有更多的样本可供测试，我不愿意尝试，以防它不适用于不同的输入。
-0777告诉 perl 立即读取整个文件，作为一个很长的字符串。
-p告诉 perl 迭代其输入，运行-e脚本中的语句，然后在脚本修改输入后打印输入。即与sed操作方式非常相似。
与一样sed，该-e选项指示下一个参数是脚本。
m和regex修饰符s改变了 perl regexp 处理多行字符串的方式。从man perlre：

"m" 将匹配的字符串视为多行。也就是说，将 and 从匹配字符串第一行的开头和最后一行的结尾更改^为$匹配字符串中每行的开头和结尾。

"s" 将字符串视为单行。也就是说，更改 .为匹配任何字符，甚至是换行符，通常它不会匹配。

一起使用时，/ms它们可以让.匹配任何字符，同时仍然允许^和$分别匹配字符串中换行符之后和之前的字符。

Answer 1