实验与例子

Question

这个问题是在 2012 年 3 月在 Austin 小组邮件列表中提出的。以下是关于此问题的最终消息（由 Austin 小组（维护 POSIX 的机构）的 Geoff Clare 提出，他也是首先提出这个问题的人）。这里是从 gmane NNTP 接口复制的：

Date: Fri, 16 Mar 2012 17:09:42 +0000
From: Geoff Clare <gwc-7882/[email protected]>
To: austin-group-l-7882/[email protected]
Newsgroups: gmane.comp.standards.posix.austin.general
Subject: Re: Strange addressing issue in sed

Stephane Chazelas <[email protected]> wrote, on 16 Mar 2012:
>
> 2012-03-16 15:44:35 +0000, Geoff Clare:
> > I've been alerted to an odd behaviour of sed on certified UNIX
> > systems that doesn't seem to match the requirements of the
> > standard.  It concerns an interaction between the 'n' command
> > and address matching.
> > 
> > According to the standard, this command:
> > 
> > printf 'A\nB\nC\nD\n' | sed '1,3s/A/B/;1,3n;1,3s/B/C/'
> > 
> > should produce the output:
> > 
> > B
> > C
> > C
> > D
> > 
> > GNU sed does produce this, but certified UNIX systems produce this:
> > 
> > B
> > B
> > C
> > D
> > 
> > However, if I change the 1,3s/B/C/ to 2,3s/B/C/ then they produce
> > the expected output (tested on Solaris and HP-UX).
> > 
> > Is this just an obscure bug from common ancestor code, or is there
> > some legitimate reason why this address change alters the behaviour?
> [...]
> 
> I suppose the idea is that for the second 1,3cmd, line "1" has
> not been seen, so the 1,3 range is not entered.

Ah yes, now it makes sense, and it looks like the standard does
require this slightly strange behaviour, given how the processing
of the "two addresses" case is specified:

    An editing command with two addresses shall select the inclusive
    range from the first pattern space that matches the first address
    through the next pattern space that matches the second.  (If the
    second address is a number less than or equal to the line number
    first selected, only one line shall be selected.) Starting at the
    first line following the selected range, sed shall look again for
    the first address. Thereafter, the process shall be repeated.

It's specified this way because the addresses can be BREs, but if
the same matching process is applied to the line numbers (even though
they can only match at most once), then the 1,3 range on that last
command is never entered.

-- 
Geoff Clare <g.clare-7882/[email protected]>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

以下是杰夫引用的（由我）其余消息的相关部分：

I suppose the idea is that for the second 1,3cmd, line "1" has
not been seen, so the 1,3 range is not entered.

Same idea as in

printf '%s\n' A B C | sed -n '1d;1,2p'

whose behavior differ in traditional (heirloom toolchest at
least) and GNU.

It's unclear to me whether POSIX wants one behavior or the
other.

所以，（根据 Geoff 的说法）POSIX 是清除GNU 行为不合规。

确实，它不太一致（seq 10 | sed -n '1d;1,2p'与相比seq 10 | sed -n '1d;/^1$/,2p'），即使对于那些不了解范围如何处理的人来说可能不那么令人惊讶（甚至杰夫最初发现了一致的行为“奇怪的”）。

没有人愿意将其作为错误报告给 GNU 人员。我不确定我是否会将其视为错误。也许最好的选择是更新 POSIX 规范，以允许这两种行为明确表明人们不能依赖其中任何一种。

编辑。我现在已经了解了sed70 年代末 Unix V7 中的原始实现，看起来很像数字地址的行为不是有意的，或者至少没有完全考虑到。

相反，随着 Geoff 对规范的阅读（以及我对它发生原因的原始解释），在：

seq 5 | sed -n '3d;1,3p'

应输出第 1、2、4 和 5 行，因为这一次，它是1,3pranged 命令从未遇到过的结束地址，如seq 5 | sed -n '3d;/1/,/3/p'

然而，在原始实现中并没有发生这种情况，我尝试过的任何其他实现也没有发生这种情况（busyboxsed返回第 1、2 和 4 行，看起来更像是一个错误）。

如果你看UNIX v7 代码，它确实检查当前行号是否为更大比（数字）结束地址，然后超出范围。事实是它不会对起始地址执行此操作看起来更像是一种疏忽，而不是有意的设计。

这意味着目前没有任何实现实际上符合 POSIX 规范在这方面的解释。

GNU 实现的另一个令人困惑的行为是：

$ seq 5 | sed -n '2d;2,/3/p'
3
4
5

由于跳过了第 2 行，因此2,/3/在第 3 行（编号 >= 2 的第一行）上输入。但正是这条线造就了我们进入范围，不检查结尾地址。情况变得更糟busybox sed：

$ seq 10 | busybox sed -n '2,7d; 2,3p'
8

由于删除了第 2 至 7 行，因此第 8 行是第一个 >= 2 的行，因此 2,3 范围为进入然后！

Answer 1

这个问题是在 2012 年 3 月在 Austin 小组邮件列表中提出的。以下是关于此问题的最终消息（由 Austin 小组（维护 POSIX 的机构）的 Geoff Clare 提出，他也是首先提出这个问题的人）。这里是从 gmane NNTP 接口复制的：

Date: Fri, 16 Mar 2012 17:09:42 +0000
From: Geoff Clare <gwc-7882/[email protected]>
To: austin-group-l-7882/[email protected]
Newsgroups: gmane.comp.standards.posix.austin.general
Subject: Re: Strange addressing issue in sed

Stephane Chazelas <[email protected]> wrote, on 16 Mar 2012:
>
> 2012-03-16 15:44:35 +0000, Geoff Clare:
> > I've been alerted to an odd behaviour of sed on certified UNIX
> > systems that doesn't seem to match the requirements of the
> > standard.  It concerns an interaction between the 'n' command
> > and address matching.
> > 
> > According to the standard, this command:
> > 
> > printf 'A\nB\nC\nD\n' | sed '1,3s/A/B/;1,3n;1,3s/B/C/'
> > 
> > should produce the output:
> > 
> > B
> > C
> > C
> > D
> > 
> > GNU sed does produce this, but certified UNIX systems produce this:
> > 
> > B
> > B
> > C
> > D
> > 
> > However, if I change the 1,3s/B/C/ to 2,3s/B/C/ then they produce
> > the expected output (tested on Solaris and HP-UX).
> > 
> > Is this just an obscure bug from common ancestor code, or is there
> > some legitimate reason why this address change alters the behaviour?
> [...]
> 
> I suppose the idea is that for the second 1,3cmd, line "1" has
> not been seen, so the 1,3 range is not entered.

Ah yes, now it makes sense, and it looks like the standard does
require this slightly strange behaviour, given how the processing
of the "two addresses" case is specified:

    An editing command with two addresses shall select the inclusive
    range from the first pattern space that matches the first address
    through the next pattern space that matches the second.  (If the
    second address is a number less than or equal to the line number
    first selected, only one line shall be selected.) Starting at the
    first line following the selected range, sed shall look again for
    the first address. Thereafter, the process shall be repeated.

It's specified this way because the addresses can be BREs, but if
the same matching process is applied to the line numbers (even though
they can only match at most once), then the 1,3 range on that last
command is never entered.

-- 
Geoff Clare <g.clare-7882/[email protected]>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

以下是杰夫引用的（由我）其余消息的相关部分：

I suppose the idea is that for the second 1,3cmd, line "1" has
not been seen, so the 1,3 range is not entered.

Same idea as in

printf '%s\n' A B C | sed -n '1d;1,2p'

whose behavior differ in traditional (heirloom toolchest at
least) and GNU.

It's unclear to me whether POSIX wants one behavior or the
other.

所以，（根据 Geoff 的说法）POSIX 是清除GNU 行为不合规。

确实，它不太一致（seq 10 | sed -n '1d;1,2p'与相比seq 10 | sed -n '1d;/^1$/,2p'），即使对于那些不了解范围如何处理的人来说可能不那么令人惊讶（甚至杰夫最初发现了一致的行为“奇怪的”）。

没有人愿意将其作为错误报告给 GNU 人员。我不确定我是否会将其视为错误。也许最好的选择是更新 POSIX 规范，以允许这两种行为明确表明人们不能依赖其中任何一种。

编辑。我现在已经了解了sed70 年代末 Unix V7 中的原始实现，看起来很像数字地址的行为不是有意的，或者至少没有完全考虑到。

相反，随着 Geoff 对规范的阅读（以及我对它发生原因的原始解释），在：

seq 5 | sed -n '3d;1,3p'

应输出第 1、2、4 和 5 行，因为这一次，它是1,3pranged 命令从未遇到过的结束地址，如seq 5 | sed -n '3d;/1/,/3/p'

然而，在原始实现中并没有发生这种情况，我尝试过的任何其他实现也没有发生这种情况（busyboxsed返回第 1、2 和 4 行，看起来更像是一个错误）。

如果你看UNIX v7 代码，它确实检查当前行号是否为更大比（数字）结束地址，然后超出范围。事实是它不会对起始地址执行此操作看起来更像是一种疏忽，而不是有意的设计。

这意味着目前没有任何实现实际上符合 POSIX 规范在这方面的解释。

GNU 实现的另一个令人困惑的行为是：

$ seq 5 | sed -n '2d;2,/3/p'
3
4
5

由于跳过了第 2 行，因此2,/3/在第 3 行（编号 >= 2 的第一行）上输入。但正是这条线造就了我们进入范围，不检查结尾地址。情况变得更糟busybox sed：

$ seq 10 | busybox sed -n '2,7d; 2,3p'
8

由于删除了第 2 至 7 行，因此第 8 行是第一个 >= 2 的行，因此 2,3 范围为进入然后！

实验与例子

实验与例子

答案1

相关内容