最紧凑的方式：“假”无名管道

Question

命令中的错误是因为 fd 4 根本没有打开。

事实上，您收到两条“错误文件描述符”消息，一条来自，wc -l另一条来自cat <&4（或xargs -a /dev/fd/4）。

您需要一个无名管道来打开 fd 4，但在 Bash 中拥有无名管道的唯一官方方法实际上是通过命令coproc。

然而，对于您的特定用例，有一个很好的捷径

最紧凑的方式：“假”无名管道

这个技巧在 Bash v5 中没有记录，但至少适用于 v4.3（还无法测试 v5）。

它利用了一些标准习惯用法，将这些习惯用法放在一起时允许在支持它们的系统上获取任意“未命名”管道。经过 ”无名管道“ 我是说 ”不需要首先通过或等效命令p在文件系统上创建类型文件的 FIFOmkfifo”。（无名管道的这个定义不是正确的，但我敢说它在使用命令 shell 时真正有意义）。

这些“未命名”管道的示例用例可归结为以下内容：

cat email.txt | ( : {pipe}<> <(:) ; tee >(sed -e '1,/^$/d' | wc -l >&${pipe}) | xargs -I% -a <({ read count ; echo $count; } <&${pipe}) sed -e '1,/^$/{/^Subject:/Is/$/ (%)/}' )

上面的命令行应该根据您的示例情况产生预期的结果。

分解解释： （仅为了清楚起见，复制粘贴时无法工作）

cat email.txt | \  # pipe data to ...
    ( \  # a compound statement, which ...
     : {pipe}<> <(:) ; \ # ... first opens the unnamed pipe in RW mode and put its fd into the (arbitrary) variable ${pipe}
    tee \ # then mirrors the data from main stdin to ...
        >( \ # the side processing of main input ...
            sed -e '1,/^$/d' | wc -l \ # ... which counts the body lines sending the result ...
            >&${pipe} \ # ... to the unnamed pipe
         ) \
    | \ # the tee also pipes all main input to ...
    xargs -I% -a \ # an xargs that reads iterative lines from ...
        <({ read count ; echo $count ; } <&${pipe}) \ # a compound command that reads the one-single line (being the count provided by wc) from ${pipe} fd, and echoes it back to xargs -a
        sed -e \ # that finally executes the sed command which looks for Subject: line in header part
        '/1,^$/{/^Subject:/Is/$/ (%)/}' ; \ # to append it with the count number
    )

一些附加说明：

需要打开未命名管道 RW，因为我发现没有办法打开通常的一对管道，一个是读取端，另一个是写入端
这意味着不能有通常的 EOF 事件通知读取部分不会有更多数据出现，您必须以其他方式自己完成，但在这里我们可以利用只有一行感兴趣的事实所以只要一张就read足够了。相反，如果您需要从侧通道读取几行，那么您需要某种带内 EOF 通知，例如一个简单的 EOF 字符串附加在输出的最末尾，然后在读取期间过滤掉xargs -a并摆脱阅读。这是完全可行的，但命令行输入的时间相当长。摆脱带内 EOF 字符串也是可能的，但更加复杂
这些无名管道的管理完全取决于您，因此您可能需要通过exec {pipe}<&-;显式关闭它们。在这个例子中我不需要这样做，因为 fd 是在子进程中创建的

为了完整起见，这里有一个使用的等效版本coproc，它通过通常的互连文件描述符对提供了一个真正的无名管道。

无名管道的官方方式：coproc

使用 coproc 的方法有很多种，但是对于您的情况，我认为最好的方法如下：

cat email.txt | (coproc cat ; : {input}<&${COPROC[0]} {output}>&${COPROC[1]} ; tee >(sed -e '1,/^$/d' | wc -l >&${output}) | xargs -I% -a <(exec cat <&${input}) sed -e '1,/^$/{/FOO/Is/$/ (%)/}' & )

分解解释： （仅为了清楚起见，复制粘贴时无法工作）

cat email.txt | \ # pipe data to ...
    ( \ # a subcommand statement, which ...
    coproc cat ; \  # ... first spawns the coprocess, a simple cat command acting as a simple line-oriented bridge
    : {cp_output}<&${COPROC[0]} {cp_input}>&${COPROC[1]} ; \ # then copies coproc’s own fds into new ones whose number are put into (arbitrary) variables ${cp_output} and ${cp_input} 
    tee \ # and then mirrors the data from main stdin to ...
        >( \ # the side processing of main input ...
            sed -e '1,/^$/d' | wc -l \ # ... which counts the body lines sending the result ...
            >&${output} \ # ... to the (bridging) coproc
          ) \
    | \ # the tee also pipes all main input to ...
    xargs -I% -a \ # an xargs that reads iterative lines from ...
        <(exec cat <&${pipe}) \ # another cat that reads from the coproc bridging the count provided by wc, and echoes it back to xargs -a
        sed -e \ # that finally executes the sed command which looks for Subject: line in header part
        '/1,^$/{/^Subject:/Is/$/ (%)/}' ; \ # to append it with the count number
    )

再次补充一些注意事项：

建议使用子命令语句，以便 coproc 的数据（即进程和 fds）不会泄漏到交互式 bash（假设您交互式地运行这个野兽！）
否则，该 coproc 数据的管理完全取决于您，因此您可能需要例如通过exec {cp_input}<&-或显式关闭 fdsexec {COPROC[1]}<&-
你可以在 coproc 中使用任何命令，但我总是发现使用简单的cat桥接两个 fd 可以成为一个方便的通用解决方案；然而，如果您设法将任何一个工作进程嵌入到 coproc 本身中，您就可以优化性能；在此示例中，您需要大量重新排列整个命令行
根据 Bash v4 文档，Bash 一次仅支持一个 coproc
然而，至少从 v4.3 开始，它确实接受更多的 coproc，尽管有明确的警告，并且 Bash v5 文档没有说明任何限制
如果有更多 coproc，您必须为每个 coproc 使用显式名称（有关详细信息，请参阅文档）
需要将 coproc 的 fd 移动/复制到任意 fd，以便它们能够在本例中使用的管道和进程替换中生存，因为该${COPROC[*]}数组不会导出到子进程，并且它自己的 fd 始终在 exec 上关闭
在这里我们可以利用xargs -a 从两个标准输入主动读取的优势和中指示的文件-a，因此不让tee填充管道的缓冲区，否则会出现死锁，您需要一些更复杂的方法来避免它

Answer 1

命令中的错误是因为 fd 4 根本没有打开。

事实上，您收到两条“错误文件描述符”消息，一条来自，wc -l另一条来自cat <&4（或xargs -a /dev/fd/4）。

您需要一个无名管道来打开 fd 4，但在 Bash 中拥有无名管道的唯一官方方法实际上是通过命令coproc。

然而，对于您的特定用例，有一个很好的捷径

最紧凑的方式：“假”无名管道

这个技巧在 Bash v5 中没有记录，但至少适用于 v4.3（还无法测试 v5）。

它利用了一些标准习惯用法，将这些习惯用法放在一起时允许在支持它们的系统上获取任意“未命名”管道。经过 ”无名管道“ 我是说 ”不需要首先通过或等效命令p在文件系统上创建类型文件的 FIFOmkfifo”。（无名管道的这个定义不是正确的，但我敢说它在使用命令 shell 时真正有意义）。

这些“未命名”管道的示例用例可归结为以下内容：

cat email.txt | ( : {pipe}<> <(:) ; tee >(sed -e '1,/^$/d' | wc -l >&${pipe}) | xargs -I% -a <({ read count ; echo $count; } <&${pipe}) sed -e '1,/^$/{/^Subject:/Is/$/ (%)/}' )

上面的命令行应该根据您的示例情况产生预期的结果。

分解解释： （仅为了清楚起见，复制粘贴时无法工作）

cat email.txt | \  # pipe data to ...
    ( \  # a compound statement, which ...
     : {pipe}<> <(:) ; \ # ... first opens the unnamed pipe in RW mode and put its fd into the (arbitrary) variable ${pipe}
    tee \ # then mirrors the data from main stdin to ...
        >( \ # the side processing of main input ...
            sed -e '1,/^$/d' | wc -l \ # ... which counts the body lines sending the result ...
            >&${pipe} \ # ... to the unnamed pipe
         ) \
    | \ # the tee also pipes all main input to ...
    xargs -I% -a \ # an xargs that reads iterative lines from ...
        <({ read count ; echo $count ; } <&${pipe}) \ # a compound command that reads the one-single line (being the count provided by wc) from ${pipe} fd, and echoes it back to xargs -a
        sed -e \ # that finally executes the sed command which looks for Subject: line in header part
        '/1,^$/{/^Subject:/Is/$/ (%)/}' ; \ # to append it with the count number
    )

一些附加说明：

需要打开未命名管道 RW，因为我发现没有办法打开通常的一对管道，一个是读取端，另一个是写入端
这意味着不能有通常的 EOF 事件通知读取部分不会有更多数据出现，您必须以其他方式自己完成，但在这里我们可以利用只有一行感兴趣的事实所以只要一张就read足够了。相反，如果您需要从侧通道读取几行，那么您需要某种带内 EOF 通知，例如一个简单的 EOF 字符串附加在输出的最末尾，然后在读取期间过滤掉xargs -a并摆脱阅读。这是完全可行的，但命令行输入的时间相当长。摆脱带内 EOF 字符串也是可能的，但更加复杂
这些无名管道的管理完全取决于您，因此您可能需要通过exec {pipe}<&-;显式关闭它们。在这个例子中我不需要这样做，因为 fd 是在子进程中创建的

为了完整起见，这里有一个使用的等效版本coproc，它通过通常的互连文件描述符对提供了一个真正的无名管道。

无名管道的官方方式：coproc

使用 coproc 的方法有很多种，但是对于您的情况，我认为最好的方法如下：

cat email.txt | (coproc cat ; : {input}<&${COPROC[0]} {output}>&${COPROC[1]} ; tee >(sed -e '1,/^$/d' | wc -l >&${output}) | xargs -I% -a <(exec cat <&${input}) sed -e '1,/^$/{/FOO/Is/$/ (%)/}' & )

分解解释： （仅为了清楚起见，复制粘贴时无法工作）

cat email.txt | \ # pipe data to ...
    ( \ # a subcommand statement, which ...
    coproc cat ; \  # ... first spawns the coprocess, a simple cat command acting as a simple line-oriented bridge
    : {cp_output}<&${COPROC[0]} {cp_input}>&${COPROC[1]} ; \ # then copies coproc’s own fds into new ones whose number are put into (arbitrary) variables ${cp_output} and ${cp_input} 
    tee \ # and then mirrors the data from main stdin to ...
        >( \ # the side processing of main input ...
            sed -e '1,/^$/d' | wc -l \ # ... which counts the body lines sending the result ...
            >&${output} \ # ... to the (bridging) coproc
          ) \
    | \ # the tee also pipes all main input to ...
    xargs -I% -a \ # an xargs that reads iterative lines from ...
        <(exec cat <&${pipe}) \ # another cat that reads from the coproc bridging the count provided by wc, and echoes it back to xargs -a
        sed -e \ # that finally executes the sed command which looks for Subject: line in header part
        '/1,^$/{/^Subject:/Is/$/ (%)/}' ; \ # to append it with the count number
    )

再次补充一些注意事项：

建议使用子命令语句，以便 coproc 的数据（即进程和 fds）不会泄漏到交互式 bash（假设您交互式地运行这个野兽！）
否则，该 coproc 数据的管理完全取决于您，因此您可能需要例如通过exec {cp_input}<&-或显式关闭 fdsexec {COPROC[1]}<&-
你可以在 coproc 中使用任何命令，但我总是发现使用简单的cat桥接两个 fd 可以成为一个方便的通用解决方案；然而，如果您设法将任何一个工作进程嵌入到 coproc 本身中，您就可以优化性能；在此示例中，您需要大量重新排列整个命令行
根据 Bash v4 文档，Bash 一次仅支持一个 coproc
然而，至少从 v4.3 开始，它确实接受更多的 coproc，尽管有明确的警告，并且 Bash v5 文档没有说明任何限制
如果有更多 coproc，您必须为每个 coproc 使用显式名称（有关详细信息，请参阅文档）
需要将 coproc 的 fd 移动/复制到任意 fd，以便它们能够在本例中使用的管道和进程替换中生存，因为该${COPROC[*]}数组不会导出到子进程，并且它自己的 fd 始终在 exec 上关闭
在这里我们可以利用xargs -a 从两个标准输入主动读取的优势和中指示的文件-a，因此不让tee填充管道的缓冲区，否则会出现死锁，您需要一些更复杂的方法来避免它

最紧凑的方式：“假”无名管道

答案1

最紧凑的方式：“假”无名管道

无名管道的官方方式：coproc

相关内容