如何在 Linux 上将句子放在单独的行上

Question 1

如果没有看到您的数据的实际示例，我无法确定，但是您是什么大概.寻找的是在每次出现,!和后添加换行符?。我不知道你想如何处理分号（;），因为它们并没有真正标记句子的结束。这取决于你。

无论如何，你可以尝试sed：

$ echo 'This is a sentence! And so is this. And this one?' | 
    sed 's/[.!?]  */&\n/g' 
This is a sentence! 
And so is this. 
And this one?

是s///替换运算符。它的一般格式是s/pat/replacement，它将替换pat为replacement。最后g的使其对所有出现的运行替换pat。如果没有它，它将停在第一个。这&是一个特殊的sed结构，意思是“无论匹配什么”。因此，在这里我们用任何匹配的内容和换行符替换.、!、或中的任何一个。?

如果您的文本可以包含缩写，例如e.g.，您可能只想在下一个字母是大写字母时进行替换：

$ echo 'This is a sentence! And so is this. And this one? Negative, i.e. no.' | sed 's/\([.!?]\) \([[:upper:]]\)/\1\n\2/g' 
This is a sentence!
And so is this.
And this one?
Negative, i.e. no.

请注意，这不会Dr. Jones said hello.正确处理句子，因为它会假设 after.定义Dr了一个句子，并且下一个字母是大写的。然而，我们现在的复杂程度远远超出了简单的问答格式，实际上需要一个成熟的自然语言解析器。

Answer

如果没有看到您的数据的实际示例，我无法确定，但是您是什么大概.寻找的是在每次出现,!和后添加换行符?。我不知道你想如何处理分号（;），因为它们并没有真正标记句子的结束。这取决于你。

无论如何，你可以尝试sed：

$ echo 'This is a sentence! And so is this. And this one?' | 
    sed 's/[.!?]  */&\n/g' 
This is a sentence! 
And so is this. 
And this one?

是s///替换运算符。它的一般格式是s/pat/replacement，它将替换pat为replacement。最后g的使其对所有出现的运行替换pat。如果没有它，它将停在第一个。这&是一个特殊的sed结构，意思是“无论匹配什么”。因此，在这里我们用任何匹配的内容和换行符替换.、!、或中的任何一个。?

如果您的文本可以包含缩写，例如e.g.，您可能只想在下一个字母是大写字母时进行替换：

$ echo 'This is a sentence! And so is this. And this one? Negative, i.e. no.' | sed 's/\([.!?]\) \([[:upper:]]\)/\1\n\2/g' 
This is a sentence!
And so is this.
And this one?
Negative, i.e. no.

请注意，这不会Dr. Jones said hello.正确处理句子，因为它会假设 after.定义Dr了一个句子，并且下一个字母是大写的。然而，我们现在的复杂程度远远超出了简单的问答格式，实际上需要一个成熟的自然语言解析器。

Question 2

尝试：

sed -e :1 -e 's/\([.?!]\)[[:blank:]]\{1,\}\([^[:blank:]]\)/\1\
\2/;t1'

在这样的输入上：

Sentence 1. Sentence 1.2? Sentence 2!? Sentence 3.
Sentence 4... Sentence 5.

它给：

Sentence 1.
Sentence 1.2?
Sentence 2!?
Sentence 3.
Sentence 4...
Sentence 5.

（并且是 POSIX）。

Answer

尝试：

sed -e :1 -e 's/\([.?!]\)[[:blank:]]\{1,\}\([^[:blank:]]\)/\1\
\2/;t1'

在这样的输入上：

Sentence 1. Sentence 1.2? Sentence 2!? Sentence 3.
Sentence 4... Sentence 5.

它给：

Sentence 1.
Sentence 1.2?
Sentence 2!?
Sentence 3.
Sentence 4...
Sentence 5.

（并且是 POSIX）。

Question 3

俏皮话之外还有生活……

句子分割器永远不会准备好，总是还有一个细节需要修复：Perl 多行代码！

#!/usr/bin/perl

use strict;
my $pont=qr{[.!?]+};                   ## pontuation
my $abrev=qr{\b(?:Pr|Dr|Mr|[A-Z])\.};  ## abreviations

$/="";   

while(<>){ chomp;                      ## for each paragraph,

  s/\h*\n\h*/ /g;                      ## remove \n
  s/($pont)\h+(\S)/$1\n$2/g;           ## pontuation+space
  s/($abrev)\n/$1 /g;                  ## undo \n after abreviations

  print "$_\n\n";
}

所以：

A single ‘-’ operand is not really an option ! It stands for
standard input. Or for standard output ? For example:
‘smth -’ reads from stdin; and is equal
to plain ‘smth’... Could it appear as any operand that
requires a file name ? Certainly !

Robert L. Stevenson wrote  Dr. Jekyll and Mr. Hyde. Back in 12.12.1886

the end

输出是：

A single ‘-’ operand is not really an option !
It stands for standard input.
Or for standard output ?
For example: ‘smth -’ reads from stdin; and is equal to plain ‘smth’...
Could it appear as any operand that requires a file name ?
Certainly !

Robert L. Stevenson wrote  Dr. Jekyll and Mr. Hyde.
Back in 12.12.1886

the end

Answer

俏皮话之外还有生活……

句子分割器永远不会准备好，总是还有一个细节需要修复：Perl 多行代码！

#!/usr/bin/perl

use strict;
my $pont=qr{[.!?]+};                   ## pontuation
my $abrev=qr{\b(?:Pr|Dr|Mr|[A-Z])\.};  ## abreviations

$/="";   

while(<>){ chomp;                      ## for each paragraph,

  s/\h*\n\h*/ /g;                      ## remove \n
  s/($pont)\h+(\S)/$1\n$2/g;           ## pontuation+space
  s/($abrev)\n/$1 /g;                  ## undo \n after abreviations

  print "$_\n\n";
}

所以：

A single ‘-’ operand is not really an option ! It stands for
standard input. Or for standard output ? For example:
‘smth -’ reads from stdin; and is equal
to plain ‘smth’... Could it appear as any operand that
requires a file name ? Certainly !

Robert L. Stevenson wrote  Dr. Jekyll and Mr. Hyde. Back in 12.12.1886

the end

输出是：

A single ‘-’ operand is not really an option !
It stands for standard input.
Or for standard output ?
For example: ‘smth -’ reads from stdin; and is equal to plain ‘smth’...
Could it appear as any operand that requires a file name ?
Certainly !

Robert L. Stevenson wrote  Dr. Jekyll and Mr. Hyde.
Back in 12.12.1886

the end

Question 4

这项任务有一些陷阱。一种选择可能是：

sed 's/\([.?!;]\) */\1\n/g' file.txt

这是替换给定字符集中的字符（[.?!;]，根据需要添加冒号或删除分号），后跟可选的空格 ( *) 和替换字符（扩展为和\1之间的匹配）和换行符 ( )。\(\)\n

Answer

这项任务有一些陷阱。一种选择可能是：

sed 's/\([.?!;]\) */\1\n/g' file.txt

这是替换给定字符集中的字符（[.?!;]，根据需要添加冒号或删除分号），后跟可选的空格 ( *) 和替换字符（扩展为和\1之间的匹配）和换行符 ( )。\(\)\n

如何在 Linux 上将句子放在单独的行上

答案1

答案2

答案3

答案4

相关内容