确认版本2

Question 1

尽管我不是 Perl 专家，但这里有一个可能的 hack。看着全合一的源文件，似乎ack只处理$输出字符串中的单个字符。更改此设置以接受多个字符无疑是可行的，但为了保持简单，您可以0..9使用abc....例如，我进行了这些更改以接受$a和$b作为$10和$11（显示为diff -u）

@@ -188,7 +188,7 @@
         $opt_output =~ s/\\r/\r/g;
         $opt_output =~ s/\\t/\t/g;
 
-        my @supported_special_variables = ( 1..9, qw( _ . ` & ' +  f ) );
+        my @supported_special_variables = ( 1..9, qw( a b _ . ` & ' +  f ) );
         @special_vars_used_by_opt_output = grep { $opt_output =~ /\$$_/ } @supported_special_variables;
 
         # If the $opt_output contains $&, $` or $', those vars won't be
@@ -924,6 +924,8 @@
                 # on them not changing in the process of doing the s///.
 
                 my %keep = map { ($_ => ${$_} // '') } @special_vars_used_by_opt_output;
+                $keep{a} = $10;
+                $keep{b} = $11;
                 $keep{_} = $line if exists $keep{_}; # Manually set it because $_ gets reset in a map.
                 $keep{f} = $filename if exists $keep{f};
                 my $special_vars_used_by_opt_output = join( '', @special_vars_used_by_opt_output );

但是，如果您只想进行第 10 场比赛，则可以使用$+如下所示最后一个成功搜索模式的最后一个括号匹配的文本。

Answer

尽管我不是 Perl 专家，但这里有一个可能的 hack。看着全合一的源文件，似乎ack只处理$输出字符串中的单个字符。更改此设置以接受多个字符无疑是可行的，但为了保持简单，您可以0..9使用abc....例如，我进行了这些更改以接受$a和$b作为$10和$11（显示为diff -u）

@@ -188,7 +188,7 @@
         $opt_output =~ s/\\r/\r/g;
         $opt_output =~ s/\\t/\t/g;
 
-        my @supported_special_variables = ( 1..9, qw( _ . ` & ' +  f ) );
+        my @supported_special_variables = ( 1..9, qw( a b _ . ` & ' +  f ) );
         @special_vars_used_by_opt_output = grep { $opt_output =~ /\$$_/ } @supported_special_variables;
 
         # If the $opt_output contains $&, $` or $', those vars won't be
@@ -924,6 +924,8 @@
                 # on them not changing in the process of doing the s///.
 
                 my %keep = map { ($_ => ${$_} // '') } @special_vars_used_by_opt_output;
+                $keep{a} = $10;
+                $keep{b} = $11;
                 $keep{_} = $line if exists $keep{_}; # Manually set it because $_ gets reset in a map.
                 $keep{f} = $filename if exists $keep{f};
                 my $special_vars_used_by_opt_output = join( '', @special_vars_used_by_opt_output );

但是，如果您只想进行第 10 场比赛，则可以使用$+如下所示最后一个成功搜索模式的最后一个括号匹配的文本。

Question 2

三种替代解决方案：

确认版本2

看来在 ack 版本 2 中变量$10 $11等是有效的：

$ echo 'abcdefghijklmn' | 
  ack '(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)' \
  --output '$1 $2 $3 $11'

a b c k

$ ack --version
ack 2.24
Running under Perl 5.28.1 at /usr/bin/perl

其中，要得到重叠字符串将是：

echo 'abcdefghijklmn' |
    ack '(.)(?=(.)(.)(.)(.)(.)(.)(.)(.)(.)(.))' \
    --output '$1 $2 $3 $11'
a b c k
b c d l
c d e m
d e f n

Perl5

但是，可以通过以下方式直接在 Perl 中完成相同的操作：

echo 'abcdefghijklmn' | 
    perl -ne 'while($_ =~ /(.)(?=(.)(.)(.)(.)(.)(.)(.)(.)(.)(.))/g ){
        print $1," ",$2," ",$11," ","\n" }'
a b k
b c l
c d m
d e n

因此，要查找并打印单词（由一个或多个空格分隔）：

echo "word1 word2 word3 word4 word5 word6" |
    perl -ne 'while($_ =~ /(\S+) +(?=(\S+) +(\S+) +(\S+))/g ){$,=" ";print $1,$2,$3,$4,"\n" }'

word1 word2 word3 word4 
word2 word3 word4 word5 
word3 word4 word5 word6

打印的行有一个尾随空格（希望您不介意）。

Perl6

:ov或者你可以尝试使用(overlap) 修饰符的Perl6 (Raku) ：

echo "one two three four five" | 
    perl6 -ne 'my @var = $_.match(/ <|w> \w+ [" "+ \w+]**2 <|w> /, :ov); say @var.join("\n") ;'

one two three
two three four
three four five

通过更改单个数字，将匹配其他计数：

echo "one two three four five" | 
perl6 -ne 'my @var = $_.match(/ <|w> \w+ [" "+ \w+]**3 <|w> /, :ov); say @var.join("\n") ;'

one two three four
two three four five

结果

使用 perl5 结果将是：

perl -ne 'while($_ =~ /(\S+) +(?=(\S+) +(\S+) +(\S+) +(\S+) +(\S+) +(\S+) +(\S+) +(\S+) +(\S+))/g ){
 $,=" ";
 print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,"\n" 
}' TWAIN_Mark_complete_parsed.txt | 
    sort | 
    uniq -c | 
    sort -rn >Twain_10grams5.txt

请注意，Perl6 无法完成（内存太多）如此大的测试文本（Perl6 仍然太新）。使用 ack 比 perl5 慢得多，但文件是相同的。

head -n 10 Twain_10grams5.txt
     17 to mrs jane clemens and mrs moffett in st louis 
      8 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
      7 in his home had been wounded and bruised almost to 
      7 his home had been wounded and bruised almost to death 
      7 happiness in his home had been wounded and bruised almost 
      6 shelley's happiness in his home had been wounded and bruised 
      5 was by the social fireside in the time of the 
      5 thing indeed if you would like to listen to it 
      5 laughable thing indeed if you would like to listen to 
      5 it was in this way that he found out that

Answer

三种替代解决方案：

确认版本2

看来在 ack 版本 2 中变量$10 $11等是有效的：

$ echo 'abcdefghijklmn' | 
  ack '(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)' \
  --output '$1 $2 $3 $11'

a b c k

$ ack --version
ack 2.24
Running under Perl 5.28.1 at /usr/bin/perl

其中，要得到重叠字符串将是：

echo 'abcdefghijklmn' |
    ack '(.)(?=(.)(.)(.)(.)(.)(.)(.)(.)(.)(.))' \
    --output '$1 $2 $3 $11'
a b c k
b c d l
c d e m
d e f n

Perl5

但是，可以通过以下方式直接在 Perl 中完成相同的操作：

echo 'abcdefghijklmn' | 
    perl -ne 'while($_ =~ /(.)(?=(.)(.)(.)(.)(.)(.)(.)(.)(.)(.))/g ){
        print $1," ",$2," ",$11," ","\n" }'
a b k
b c l
c d m
d e n

因此，要查找并打印单词（由一个或多个空格分隔）：

echo "word1 word2 word3 word4 word5 word6" |
    perl -ne 'while($_ =~ /(\S+) +(?=(\S+) +(\S+) +(\S+))/g ){$,=" ";print $1,$2,$3,$4,"\n" }'

word1 word2 word3 word4 
word2 word3 word4 word5 
word3 word4 word5 word6

打印的行有一个尾随空格（希望您不介意）。

Perl6

:ov或者你可以尝试使用(overlap) 修饰符的Perl6 (Raku) ：

echo "one two three four five" | 
    perl6 -ne 'my @var = $_.match(/ <|w> \w+ [" "+ \w+]**2 <|w> /, :ov); say @var.join("\n") ;'

one two three
two three four
three four five

通过更改单个数字，将匹配其他计数：

echo "one two three four five" | 
perl6 -ne 'my @var = $_.match(/ <|w> \w+ [" "+ \w+]**3 <|w> /, :ov); say @var.join("\n") ;'

one two three four
two three four five

结果

使用 perl5 结果将是：

perl -ne 'while($_ =~ /(\S+) +(?=(\S+) +(\S+) +(\S+) +(\S+) +(\S+) +(\S+) +(\S+) +(\S+) +(\S+))/g ){
 $,=" ";
 print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,"\n" 
}' TWAIN_Mark_complete_parsed.txt | 
    sort | 
    uniq -c | 
    sort -rn >Twain_10grams5.txt

请注意，Perl6 无法完成（内存太多）如此大的测试文本（Perl6 仍然太新）。使用 ack 比 perl5 慢得多，但文件是相同的。

head -n 10 Twain_10grams5.txt
     17 to mrs jane clemens and mrs moffett in st louis 
      8 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
      7 in his home had been wounded and bruised almost to 
      7 his home had been wounded and bruised almost to death 
      7 happiness in his home had been wounded and bruised almost 
      6 shelley's happiness in his home had been wounded and bruised 
      5 was by the social fireside in the time of the 
      5 thing indeed if you would like to listen to it 
      5 laughable thing indeed if you would like to listen to 
      5 it was in this way that he found out that

确认版本2

问题背景

问题/我尝试过的

预期/期望输出

从评论中编辑

解析注释

系统详情

答案1

答案2

确认版本2

Perl5

Perl6

结果

相关内容