如何使用 sed 或 awk 删除段落内的换行符

如何使用 sed 或 awk 删除段落内的换行符

我想知道如何删除段落中的换行符这本书以及其他在kindle中使用的内容。期望的效果是将由空行分隔的每个块变成连续的文本行。我通过一系列复杂的 vim 替代命令完成了这本书的工作,但我宁愿尝试找到一种更好的方法来完成未来的工作。

我的希望是获得一个可以用于此目的的 vim、perl、sed 或 awk 脚本,但我对你们的想法持开放态度。

解决方案已经找到,但这里有一个示例输入输出,供将来使用谷歌搜索的人使用。

输入换行符:

Letter 1

_To Mrs. Saville, England._


St. Petersburgh, Dec. 11th, 17—.


You will rejoice to hear that no disaster has accompanied the
commencement of an enterprise which you have regarded with such evil
forebodings. I arrived here yesterday, and my first task is to assure
my dear sister of my welfare and increasing confidence in the success
of my undertaking.

I am already far north of London, and as I walk in the streets of
Petersburgh, I feel a cold northern breeze play upon my cheeks, which
braces my nerves and fills me with delight. Do you understand this
feeling? This breeze, which has travelled from the regions towards
which I am advancing, gives me a foretaste of those icy climes.
Inspirited by this wind of promise, my daydreams become more fervent
and vivid. I try in vain to be persuaded that the pole is the seat of
frost and desolation; it ever presents itself to my imagination as the
region of beauty and delight. There, Margaret, the sun is for ever
visible, its broad disk just skirting the horizon and diffusing a...

段落中不带换行符的输出:

_To Mrs. Saville, England._


St. Petersburgh, Dec. 11th, 17--.


You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings. I arrived here yesterday; and my first task is to assure my dear sister of my welfare, and increasing confidence in the success of my undertaking.

I am already far north of London; and as I walk in the streets of Petersburgh, I feel a cold northern breeze play upon my cheeks, which braces my nerves, and fills me with delight. Do you understand this feeling? This breeze, which has travelled from the regions towards which I am advancing, gives me a foretaste of those icy climes. Inspirited by this wind of promise, my day dreams become more fervent and vivid. I try in vain to be persuaded that the pole is the seat of frost and desolation; it ever presents itself to my imagination as the region of beauty and delight. There, Margaret, the sun is for ever visible; its broad disk just skirting the horizon, and diffusing a... 

现在我最初出于好奇而使用的 vim 命令:

ggVG:norm A<space>   -- adds a space to the end of each line
:%s/\v^\s*$/<++>     -- swaps all blank lines with a unique temporary string
ggVGgJ               -- joins all lines without adding a space
:%s/<++>/\r\r/g      -- replaces all occurrences of my unique string with two newline characters 

答案1

如果段落已被两个或多个换行符分隔,并且您只想删除每个段落内的换行符(或者更好的是,用空格替换换行符),则:

perl -00 -lpe 's/\n/ /g' pg42324.txt > pg42324-new.txt
  • -00告诉 Perl 一次读取并处理输入的一个段落(段落边界是两个或多个换行符)

  • -l打开 Perl 对行结尾(或者在本例中为段落结尾)的自动处理

  • -p使 perl 运行类似于sed- 即在脚本进行任何修改后读取并打印输入。

  • -e告诉 perl 下一个参数是要运行的脚本

有关这些选项的更多详细信息,请参阅man perlrun

或者,进行就地编辑(最初使用 .bak 扩展名进行备份):

perl -i.bak -00 -lpe 's/\n/ /g' pg42324.txt 

如果段落内的任何行上有前导或尾随空格,您可能需要将多个空格替换为单个空格 - 添加; s/ +/ /g到 perl 脚本:

perl -00 -lpe 's/\n/ /g; s/  +/ /g' 

不过,在我看来,您最好将整个文件视为 markdown (甚至可能为粗体、斜体、章节标题等添加 markdown 格式)并使用潘多克或者将其从 markdown 转换为 epub 的东西。毕竟,Markdown 只是带有可选格式字符的纯文本。例如

pandoc pg42324.txt -o pg42324.epub

最小的编辑是仅打开文件vim(或其他)并确保每个段落之间有一个空行。

顺便提一句,使用 pandoc 创建电子书是关于从文本或 Markdown 文件创建 .epub 书籍的简短但很好的总体介绍。


或者,更好的是,只需下载该书的 .epub 或 .mobi 版本,而不是纯文本版本 - 古腾堡计划提供多种格式的书籍。

有各种格式下载玛丽·雪莱的《弗兰肯斯坦》的链接:

https://www.gutenberg.org/ebooks/42324

答案2

请注意,awk通过设置为空提供了所谓的“段落模式” RS,这对于这种情况可能会派上用场。

GNUawkRT自动变量可以捕获段落之间的实际记录分隔符,使其整洁紧凑:

gawk '{$1=$1; print $0 RT}' RS= ORS= pg42324.txt

RS设置为空以启用段落模式。

ORS设置为空以便RT仅通过变量显式打印分隔符。


或者作为更正式正确的等效项,通过专用选项设置RS和,因为放置在脚本之后的参数通常保留为输入文件名或脚本本身的参数:ORS-v

gawk -v RS='' -v ORS='' '{$1=$1; print $0 RT}' pg42324.txt

答案3

如果您想标准化新行/换行:

wget https://www.gutenberg.org/cache/epub/42324/pg42324.txt
dos2unix pg42324.txt
perl -0777 -pe 's/\n{3,}/\n\n/g' pg42324.txt | less

如果你想就地编辑:

perl -0777 -i -pe 's/\n{2,}/\n\n/g' pg42324.txt

答案4

使用任何 awk:

$ cat tst.awk
NF { buf=buf $0 OFS; next }
{ prtBuf(); print }
END { prtBuf() }

function prtBuf() {
    sub(OFS"$",ORS,buf)
    printf "%s", buf
    buf = ""
}

$ awk -f tst.awk letter
_To Mrs. Saville, England._


St. Petersburgh, Dec. 11th, 17—.


You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings. I arrived here yesterday, and my first task is to assure my dear sister of my welfare and increasing confidence in the success of my undertaking.

I am already far north of London, and as I walk in the streets of Petersburgh, I feel a cold northern breeze play upon my cheeks, which braces my nerves and fills me with delight. Do you understand this feeling? This breeze, which has travelled from the regions towards which I am advancing, gives me a foretaste of those icy climes. Inspirited by this wind of promise, my daydreams become more fervent and vivid. I try in vain to be persuaded that the pole is the seat of frost and desolation; it ever presents itself to my imagination as the region of beauty and delight. There, Margaret, the sun is for ever visible, its broad disk just skirting the horizon and diffusing a...

相关内容