bash 脚本的 while 循环读取标准输入或参数

Question 1

改变的是这个块

while read name; do
    efetch -db nucleotide -id $name -format gpc > $name.xml;
done < "$@"

这使得efetchin 循环运行，其标准输入重定向到参数给定的文件。因此，这对efetch使用方式做了两个改变：

它的标准输入不再是默认的（终端）
它的参数列表不再是字面上的脚本命令行参数，而是间接来自文件。

如果efetch检测到它的输入不是终端，它很可能直接重新打开终端（也许这就是您所说的“efetch 接受 stdin 而不是 id”）。或者，如果efetch正在读取其标准输入，它可能会读取意外的内容（在快速测试中，这似乎是脚本本身）。

@切普纳指出 shell（在本例中为 bash）不会为循环生成子进程。我想到了一个不同的情况。考虑这两个脚本：

#!/bin/bash 
LAST=...
while read name
do
    /bin/echo "** $name"
    LAST="$name"
done < "$@"
echo "...$LAST"

和

#!/bin/bash
LAST=...
cat "$@" | while read name
do
    /bin/echo "** $name"
    LAST="$name"
done
echo "...$LAST"

后者（管道）将在最后回显“......”，而前者（重定向）将回显分配给LAST循环内的最后一个变量。使用管道的形式有时被评论为需要子进程来解释变量赋值不传播到循环之外的原因。

有趣的是，后者（管道）的 shell 之间在使用的进程数量方面存在差异。使用 (Debian/testing) bash、dash (/bin/sh)、zsh 和 ksh93 进行测试，用于strace -fo捕获系统调用和进程 ID：

#!/bin/sh
for sh in bash dash zsh ksh93
do
    echo "++ $sh"
    strace -fo $sh.log ./do-$sh ./once
    LC=$(sed -e 's/ .*//' $sh.log |sort -u |wc -l)
    WC=$(wc -l $sh.log)
    echo "-- $LC / $WC"
done

该脚本显示了每个 shell 的进程数和系统调用数。（该文件once包含两行：“第一”和“第二”，以消除一个测试边界）。

我发现 zsh 和 ksh93 使用的进程比 bash 和 dash 少一个：

$ ./testit
++ bash
** first
** second
......
-- 5 / 401 bash.log
++ dash
** first
** second
......
-- 5 / 222 dash.log
++ zsh
** first
** second
...second
-- 4 / 568 zsh.log
++ ksh93
** first
** second
...second
-- 4 / 336 ksh93.log

在此示例中，运行管道比使用此处文档多需要 1 或 2 个进程。

Answer