提取并比较文本块

Question 1

$ perl -00 -F'\n' -n -e '
  $file = shift @F;
  pop @F;
  if (@F < 4) {
    $file =~ s/^---- | ----//g;
    open(OUT, ">", $file.".txt");
    print OUT join("\n", @F), "\n"
  }' input.txt

这个 perl 单行语句用于-00以段落模式读取输入（由两个或多个换行符分隔），-F自动将换行符上的输入分割到数组 @F 中，并-n自动读取输入而不打印它（类似于sed -n）。

首先，它使用shift 将@F 的第一个元素放入变量$file 中。然后 pop @F 丢弃最后一个元素 ( -------------------)。如果剩余元素少于 4 个，则：从 $file 中删除----和----，打开“$file.txt”进行写入，并将数组的其余部分打印到该文件。

如果您不喜欢这些文件名，您可以使用其他方法 - 例如增加计数器变量，例如在块$file = sprintf "file%04i.txt", ++$counter内if而不是使用s///运算符。

顺便说一句，如果您想保留页眉---- BLOCK...和页脚，请将和行------*替换为并将测试更改为.shiftpop$file = $F[0]ifif (@F < 6)

示例输出（用于tail打印文件名）：

$ tail BLOCK*.txt
==> BLOCK THREE.txt <==
some data
another data
more data

==> BLOCK two.txt <==
some data
another data

与独立脚本相同，但使用计数器生成文件名：

$ cat split-blocks.pl
#!/usr/bin/perl

use strict;
my $counter;

$/='';

while(<<>>) {
  my @lines = split /\n/;
  my $file = shift @lines;
  pop @lines;

  if (@lines < 4) {
    $file = sprintf 'file%04i.txt', ++$counter
    open(OUT, ">", $file) || die "couldn't open $file for write: $!\n";
    print OUT join("\n", @lines), "\n"
  }
};

$ ./split-blocks.pl input.txt

$ tail file*
==> file0001.txt <==
some data
another data

==> file0002.txt <==
some data
another data
more data

Answer

$ perl -00 -F'\n' -n -e '
  $file = shift @F;
  pop @F;
  if (@F < 4) {
    $file =~ s/^---- | ----//g;
    open(OUT, ">", $file.".txt");
    print OUT join("\n", @F), "\n"
  }' input.txt

这个 perl 单行语句用于-00以段落模式读取输入（由两个或多个换行符分隔），-F自动将换行符上的输入分割到数组 @F 中，并-n自动读取输入而不打印它（类似于sed -n）。

首先，它使用shift 将@F 的第一个元素放入变量$file 中。然后 pop @F 丢弃最后一个元素 ( -------------------)。如果剩余元素少于 4 个，则：从 $file 中删除----和----，打开“$file.txt”进行写入，并将数组的其余部分打印到该文件。

如果您不喜欢这些文件名，您可以使用其他方法 - 例如增加计数器变量，例如在块$file = sprintf "file%04i.txt", ++$counter内if而不是使用s///运算符。

顺便说一句，如果您想保留页眉---- BLOCK...和页脚，请将和行------*替换为并将测试更改为.shiftpop$file = $F[0]ifif (@F < 6)

示例输出（用于tail打印文件名）：

$ tail BLOCK*.txt
==> BLOCK THREE.txt <==
some data
another data
more data

==> BLOCK two.txt <==
some data
another data

与独立脚本相同，但使用计数器生成文件名：

$ cat split-blocks.pl
#!/usr/bin/perl

use strict;
my $counter;

$/='';

while(<<>>) {
  my @lines = split /\n/;
  my $file = shift @lines;
  pop @lines;

  if (@lines < 4) {
    $file = sprintf 'file%04i.txt', ++$counter
    open(OUT, ">", $file) || die "couldn't open $file for write: $!\n";
    print OUT join("\n", @lines), "\n"
  }
};

$ ./split-blocks.pl input.txt

$ tail file*
==> file0001.txt <==
some data
another data

==> file0002.txt <==
some data
another data
more data

Question 2

这是一个简单的单行：

$ perl -00 -lne '@k=(/\n/mg); print if $#k < 4 ' file
---- BLOCK two ----
some data
another data
-------------------

---- BLOCK THREE ----
some data
another data
more data
-------------------

打开-00“段落模式”，将由空行分隔的每个行块视为单个“行”。向每个调用添加-l一个换行符print，并从每个输入“行”中删除尾随换行符，这意味着“在输入的每一行上-n运行由给出的脚本”。-e

脚本本身\n在输入“行”（段落）中查找字符并将它们存储在数组中。然后，如果数组的最大索引小于 4，我们将打印该行。请记住，数组从开始计数0，因此这意味着如果我们的行数少于 4，因为第一行也被计算在内，但最后一行则不计算，因为它的尾随换行符已被删除-l，那么数组中的行数少于 4堵塞。

Answer

这是一个简单的单行：

$ perl -00 -lne '@k=(/\n/mg); print if $#k < 4 ' file
---- BLOCK two ----
some data
another data
-------------------

---- BLOCK THREE ----
some data
another data
more data
-------------------

打开-00“段落模式”，将由空行分隔的每个行块视为单个“行”。向每个调用添加-l一个换行符print，并从每个输入“行”中删除尾随换行符，这意味着“在输入的每一行上-n运行由给出的脚本”。-e

脚本本身\n在输入“行”（段落）中查找字符并将它们存储在数组中。然后，如果数组的最大索引小于 4，我们将打印该行。请记住，数组从开始计数0，因此这意味着如果我们的行数少于 4，因为第一行也被计算在内，但最后一行则不计算，因为它的尾随换行符已被删除-l，那么数组中的行数少于 4堵塞。

Question 3

使用任何awk：

awk 'NF && !/^-/{ buf= buf (data?ORS:"") $0; data++ }
          /^-+$/{ if(data<4) { print (ors?ORS:"") buf; ors=ORS }; buf=data="" }' infile

输出：

some data
another data

some data
another data
more data

如果您想保留块页眉/页脚：

awk 'NF{ buf= buf (data?ORS:"") $0; data++ }
 /^-+$/{ if(data<=5) { print (ors?ORS:"") buf; ors=ORS }; buf=data="" }' infile

输出：

---- BLOCK two ----
some data
another data
-------------------

---- BLOCK THREE ----
some data
another data
more data
-------------------

Answer

使用任何awk：

awk 'NF && !/^-/{ buf= buf (data?ORS:"") $0; data++ }
          /^-+$/{ if(data<4) { print (ors?ORS:"") buf; ors=ORS }; buf=data="" }' infile

输出：

some data
another data

some data
another data
more data

如果您想保留块页眉/页脚：

awk 'NF{ buf= buf (data?ORS:"") $0; data++ }
 /^-+$/{ if(data<=5) { print (ors?ORS:"") buf; ors=ORS }; buf=data="" }' infile

输出：

---- BLOCK two ----
some data
another data
-------------------

---- BLOCK THREE ----
some data
another data
more data
-------------------

Question 4

perl -lne '
    if (/^-+$/) {
        print join "\n", @M, $_ if $#M < 4;
        @M = ();
        next;
    }

    if (/^-/) {
        $M[0] = $_;
        next;
    }

    push @M, $_ if $#M >= 0;
' file

由于禁用了自动打印选项，perl逐行读取文件。perl(-n)

然后如果我们在某个部分内，(push @M, $_ if $#M >= 0)将该线推入数组内。

如果我们在结束部分内(/^-+$/)并且数组长度小于 4，则打印它并清空数组(@M = ())。

否则，我们开始一个部分(/^-/)，并将当前行保存为数组中的第一个值，然后它就成为标题。

我已经删除了空行，但如果愿意，您可以保留它。

Answer

perl -lne '
    if (/^-+$/) {
        print join "\n", @M, $_ if $#M < 4;
        @M = ();
        next;
    }

    if (/^-/) {
        $M[0] = $_;
        next;
    }

    push @M, $_ if $#M >= 0;
' file