从一行中提取另一行中模式给出的位置之间的字符串

Question 1

使用awk：

$ awk '!seen{match($0, /A.*B/);seen=1;next} {print substr($0,RSTART,RLENGTH);seen=0}' infile
7890MNOP
34567890MNOPQRST

解释：读入男人awk：

RSTART
          The index of the first character matched by match(); 0 if no
          match.  (This implies that character indices start at one.)

RLENGTH
          The length of the string matched by match(); -1 if no match.

match(s, r [, a])  
          Return the position in s where the regular expression r occurs, 
          or 0 if r is not present, and set the values of RSTART and RLENGTH. (...)

substr(s, i [, n])
          Return the at most n-character substring of s starting at I.
          If n is omitted, use the rest of s.

Answer

使用awk：

$ awk '!seen{match($0, /A.*B/);seen=1;next} {print substr($0,RSTART,RLENGTH);seen=0}' infile
7890MNOP
34567890MNOPQRST

解释：读入男人awk：

RSTART
          The index of the first character matched by match(); 0 if no
          match.  (This implies that character indices start at one.)

RLENGTH
          The length of the string matched by match(); -1 if no match.

match(s, r [, a])  
          Return the position in s where the regular expression r occurs, 
          or 0 if r is not present, and set the values of RSTART and RLENGTH. (...)

substr(s, i [, n])
          Return the at most n-character substring of s starting at I.
          If n is omitted, use the rest of s.

Question 2

既然你提到sed，您也可以使用 sed 脚本执行此操作：

/^x*Ax*Bx*$/{              # If an index line is matched, then
  N                        # append the next (content) line into the pattern buffer
  :a                       # label a
  s/^x(.*\n).(.*)/\1\2/    # remove "x" from the index line start and a char from the content line start
  ta                       # if a subtitution happened in the previous line then jump back to a
  :b                       # label a
  s/(.*)x(\n.*).$/\1\2/    # remove "x" from the index line end and a char from the content line end
  tb                       # if a subtitution happened in the previous line then jump back to b
  s/.*\n//                 # remove the index line
}

如果将所有这些放在一个命令行上，它看起来像这样：

$ sed -r '/^x*Ax*Bx*$/{N;:a;s/^x(.*\n).(.*)/\1\2/;ta;:b;s/(.*)x(\n.*).$/\1\2/;tb;s/.*\n//;}' example-file.txt
7890MNOP
34567890MNOPQRST
$

-r是必需的，这样sed才能理解正则表达式分组括号而无需额外的转义。

值得一提的是，我不认为这可以仅靠来完成grep，尽管我很高兴被证明是错误的。

Answer

既然你提到sed，您也可以使用 sed 脚本执行此操作：

/^x*Ax*Bx*$/{              # If an index line is matched, then
  N                        # append the next (content) line into the pattern buffer
  :a                       # label a
  s/^x(.*\n).(.*)/\1\2/    # remove "x" from the index line start and a char from the content line start
  ta                       # if a subtitution happened in the previous line then jump back to a
  :b                       # label a
  s/(.*)x(\n.*).$/\1\2/    # remove "x" from the index line end and a char from the content line end
  tb                       # if a subtitution happened in the previous line then jump back to b
  s/.*\n//                 # remove the index line
}

如果将所有这些放在一个命令行上，它看起来像这样：

$ sed -r '/^x*Ax*Bx*$/{N;:a;s/^x(.*\n).(.*)/\1\2/;ta;:b;s/(.*)x(\n.*).$/\1\2/;tb;s/.*\n//;}' example-file.txt
7890MNOP
34567890MNOPQRST
$

-r是必需的，这样sed才能理解正则表达式分组括号而无需额外的转义。

值得一提的是，我不认为这可以仅靠来完成grep，尽管我很高兴被证明是错误的。

Question 3

虽然你可以用 AWK 来做这件事，但我建议用 Perl。下面是一个脚本：

#!/usr/bin/env perl

use strict;
use warnings;

while (my $pattern = <>) {
    my $text = <>;
    my $start = index $pattern, 'A';
    my $stop = index $pattern, 'B', $start;
    print substr($text, $start, $stop - $start + 1), "\n";
}

您可以随意命名该脚本文件。如果您要命名它interval并放在当前目录中，则可以用将其标记为可执行文件chmod +x interval。然后您可以运行：

./interval paths...

代替paths...替换为要解析的文件的实际路径名。例如：

$ ./interval interval-example.txt
7890MNOP
34567890MNOPQRST

脚本的工作方式是，直到输入结束（即没有更多行），它：

读取一行，，这是带有和$pattern的字符串，以及另一行，，这是将被切片的字符串。AB$text
A查找中的第一个的索引，以及除该第一个之前的任何之外的$pattern第一个的索引，并分别将它们存储在和变量中。BA$start$stop
$text仅切出索引范围从$start到的部分$stop。Perl 的substr函数采用偏移量和长度参数，这是减法的原因，而您要包括紧挨着的字母B，这是添加的原因1。
仅打印该部分，然后换行。

如果出于某种原因你更喜欢简短的单行命令可以达到同样的效果，但很容易粘贴——但也更难理解和维护——那么你可以使用这个：

perl -wple '$i=index $_,"A"; $_=substr <>,$i,index($_,"B",$i)-$i+1' paths...

（和以前一样，你必须更换paths...使用实际路径名。）

Answer

虽然你可以用 AWK 来做这件事，但我建议用 Perl。下面是一个脚本：

#!/usr/bin/env perl

use strict;
use warnings;

while (my $pattern = <>) {
    my $text = <>;
    my $start = index $pattern, 'A';
    my $stop = index $pattern, 'B', $start;
    print substr($text, $start, $stop - $start + 1), "\n";
}

您可以随意命名该脚本文件。如果您要命名它interval并放在当前目录中，则可以用将其标记为可执行文件chmod +x interval。然后您可以运行：

./interval paths...

代替paths...替换为要解析的文件的实际路径名。例如：

$ ./interval interval-example.txt
7890MNOP
34567890MNOPQRST

脚本的工作方式是，直到输入结束（即没有更多行），它：

读取一行，，这是带有和$pattern的字符串，以及另一行，，这是将被切片的字符串。AB$text
A查找中的第一个的索引，以及除该第一个之前的任何之外的$pattern第一个的索引，并分别将它们存储在和变量中。BA$start$stop
$text仅切出索引范围从$start到的部分$stop。Perl 的substr函数采用偏移量和长度参数，这是减法的原因，而您要包括紧挨着的字母B，这是添加的原因1。
仅打印该部分，然后换行。

如果出于某种原因你更喜欢简短的单行命令可以达到同样的效果，但很容易粘贴——但也更难理解和维护——那么你可以使用这个：

perl -wple '$i=index $_,"A"; $_=substr <>,$i,index($_,"B",$i)-$i+1' paths...

（和以前一样，你必须更换paths...使用实际路径名。）

Question 4

我们不确定是否...

情侣之间或情侣之前可能会有线条，但这些线条并不属于情侣；标题？解释？评论？
第一行以x 根据定义
这对夫妇的第二句台词可能以x

为了捕捉所有这些情况，set()我们可以利用仅有的存在（全部）x，，A。B我们可以肯定，这些是我们夫妻俩的第一句话。

因此我们在python中得到：

#!/usr/bin/env python3

f = "/path/to/file"

printresult = False

for l in open(f):
    if printresult == True:
        print(l[i[0]:i[1]])
        printresult = False
    elif set(l.strip()) == {"A", "x", "B"}:
        i = [l.index("A"), l.index("B") + 1]
        printresult = True

因此，输出：

Some results of whatever test
-----------------------------
xxxxxxAxxxxxxBxxxxxx
1234567890MNOPQRSTUV
blub or blublub
xxAxxxxxxxxxxxxxxBxxxxxx
1234567890MNOPQRSTUVWXYZ
peanutbutter
AxxxxxxxxxxxxxxBxxxxxx
x234567890MNOPQRSTUVWXYZ

变成：

7890MNOP
34567890MNOPQRST
x234567890MNOPQR

Answer

我们不确定是否...

情侣之间或情侣之前可能会有线条，但这些线条并不属于情侣；标题？解释？评论？
第一行以x 根据定义
这对夫妇的第二句台词可能以x

为了捕捉所有这些情况，set()我们可以利用仅有的存在（全部）x，，A。B我们可以肯定，这些是我们夫妻俩的第一句话。

因此我们在python中得到：

#!/usr/bin/env python3

f = "/path/to/file"

printresult = False

for l in open(f):
    if printresult == True:
        print(l[i[0]:i[1]])
        printresult = False
    elif set(l.strip()) == {"A", "x", "B"}:
        i = [l.index("A"), l.index("B") + 1]
        printresult = True

因此，输出：

Some results of whatever test
-----------------------------
xxxxxxAxxxxxxBxxxxxx
1234567890MNOPQRSTUV
blub or blublub
xxAxxxxxxxxxxxxxxBxxxxxx
1234567890MNOPQRSTUVWXYZ
peanutbutter
AxxxxxxxxxxxxxxBxxxxxx
x234567890MNOPQRSTUVWXYZ

变成：

7890MNOP
34567890MNOPQRST
x234567890MNOPQR

从一行中提取另一行中模式给出的位置之间的字符串

答案1

答案2

答案3

答案4

我们不确定是否...

相关内容