通用的“while read some”到“parallel”转换/替换

Question 1

我会使用 bash 函数并调用它：

myfunc() {
   filename="$1"
   unixtime=$(git log -1 --format="%at" -- "${filename}");
   touchtime=$(date -d @$unixtime +'%Y%m%d%H%M.%S');
   touch -t ${touchtime} "${filename}";
}
export -f myfunc

git ls-tree -r --name-only HEAD | parallel myfunc

parallel -0如果您想在 NUL 上拆分，请使用。

如果你想在不安装 GNU Parallel 的情况下运行上面的代码，你可以使用：

parallel --embed > myscript.sh

然后将以上内容附加到myscript.sh.

Answer

我会使用 bash 函数并调用它：

myfunc() {
   filename="$1"
   unixtime=$(git log -1 --format="%at" -- "${filename}");
   touchtime=$(date -d @$unixtime +'%Y%m%d%H%M.%S');
   touch -t ${touchtime} "${filename}";
}
export -f myfunc

git ls-tree -r --name-only HEAD | parallel myfunc

parallel -0如果您想在 NUL 上拆分，请使用。

如果你想在不安装 GNU Parallel 的情况下运行上面的代码，你可以使用：

parallel --embed > myscript.sh

然后将以上内容附加到myscript.sh.

Question 2

可以让 bash 或 ksh 同时运行一系列独立命令，这样每个流在其前一个任务退出后立即开始一个新命令。除了尾端命令之外，流保持忙碌状态。

基本方法是启动多个异步 shell，它们都从同一管道读取：管道保证行缓冲和原子读取（可以使用命令文件，cat file |但不能通过重定向来使用）。

命令可以是任何 shell 单行命令（使用拥有流的 shell 的正确语法），但不能依赖于先前命令的结果，因为命令到流的分配是任意的。复杂的命令最好设置为外部脚本，这样它们就可以作为带有参数的简单命令来调用。

这是三个流上的六个作业的测试运行，说明了作业的重叠。（我还在我的笔记本电脑上对 80 个流中的 240 个作业进行了压力测试。）

Time now 23:53:47.328735254
Sleep until 00 seconds to make debug easier.
Starting 3 Streams
23:54:00.040   Shell   1 Job   1 Go    sleep 5
23:54:00.237   Shell   2 Job   2 Go    sleep 13
23:54:00.440   Shell   3 Job   3 Go    sleep 14
Started all Streams
23:54:05.048   Shell   1 Job   1   End sleep 5
23:54:05.059   Shell   1 Job   4 Go    sleep 3
23:54:08.069   Shell   1 Job   4   End sleep 3
23:54:08.080   Shell   1 Job   5 Go    sleep 13
23:54:13.245   Shell   2 Job   2   End sleep 13
23:54:13.255   Shell   2 Job   6 Go    sleep 3
23:54:14.449   Shell   3 Job   3   End sleep 14
23:54:16.264   Shell   2 Job   6   End sleep 3
23:54:21.089   Shell   1 Job   5   End sleep 13
All Streams Ended

这是为这些作业提供调试的代理脚本。

#! /bin/bash

#.. jobProxy.
#.. arg 1: Job number.
#.. arg 2: Sleep time.
#.. idStream: Exported into the Stream's shell.

    fmt='%.12s   Shell %3d Job %3d %s sleep %s\n'
    printf "${fmt}" $( date '+%T.%N' ) "${idStream}" "${1}" "Go   " "${2}"
    sleep "${2}"
    printf "${fmt}" $( date '+%T.%N' ) "${idStream}" "${1}" "  End" "${2}"

这是流管理脚本。它创建作业命令来运行代理，并启动后台 shell。

#! /bin/bash

makeJobs () {

    typeset nJobs="${1}"

    typeset Awk='
BEGIN { srand( Seed % 10000000); fmt = "./jobProxy %s %3d\n"; }
{ printf (fmt, $1, 2 + int (14 * rand())); }
'
    seq 1 "${nJobs}" | awk -v Seed=$( date "+%N$$" ) "${Awk}"
}

runStreams () {

    typeset n nStreams="${1}"

    echo "Starting ${nStreams} Streams"
    for (( n = 1; n <= nStreams; ++n )); do
        idStream="${n}" bash -s &
        sleep 0.20
    done
    echo "Started all Streams"

    wait
    echo "All Streams Ended"
}

## Script Body Starts Here.

    date '+Time now %T.%N'
    echo 'Sleep until 00 seconds to make debug easier.'
    sleep $( date '+%S.%N' | awk '{ print 60 - $1; }' )

    makeJobs 6 | runStreams 3

Answer

可以让 bash 或 ksh 同时运行一系列独立命令，这样每个流在其前一个任务退出后立即开始一个新命令。除了尾端命令之外，流保持忙碌状态。

基本方法是启动多个异步 shell，它们都从同一管道读取：管道保证行缓冲和原子读取（可以使用命令文件，cat file |但不能通过重定向来使用）。

命令可以是任何 shell 单行命令（使用拥有流的 shell 的正确语法），但不能依赖于先前命令的结果，因为命令到流的分配是任意的。复杂的命令最好设置为外部脚本，这样它们就可以作为带有参数的简单命令来调用。

这是三个流上的六个作业的测试运行，说明了作业的重叠。（我还在我的笔记本电脑上对 80 个流中的 240 个作业进行了压力测试。）

Time now 23:53:47.328735254
Sleep until 00 seconds to make debug easier.
Starting 3 Streams
23:54:00.040   Shell   1 Job   1 Go    sleep 5
23:54:00.237   Shell   2 Job   2 Go    sleep 13
23:54:00.440   Shell   3 Job   3 Go    sleep 14
Started all Streams
23:54:05.048   Shell   1 Job   1   End sleep 5
23:54:05.059   Shell   1 Job   4 Go    sleep 3
23:54:08.069   Shell   1 Job   4   End sleep 3
23:54:08.080   Shell   1 Job   5 Go    sleep 13
23:54:13.245   Shell   2 Job   2   End sleep 13
23:54:13.255   Shell   2 Job   6 Go    sleep 3
23:54:14.449   Shell   3 Job   3   End sleep 14
23:54:16.264   Shell   2 Job   6   End sleep 3
23:54:21.089   Shell   1 Job   5   End sleep 13
All Streams Ended

这是为这些作业提供调试的代理脚本。

#! /bin/bash

#.. jobProxy.
#.. arg 1: Job number.
#.. arg 2: Sleep time.
#.. idStream: Exported into the Stream's shell.

    fmt='%.12s   Shell %3d Job %3d %s sleep %s\n'
    printf "${fmt}" $( date '+%T.%N' ) "${idStream}" "${1}" "Go   " "${2}"
    sleep "${2}"
    printf "${fmt}" $( date '+%T.%N' ) "${idStream}" "${1}" "  End" "${2}"

这是流管理脚本。它创建作业命令来运行代理，并启动后台 shell。

#! /bin/bash

makeJobs () {

    typeset nJobs="${1}"

    typeset Awk='
BEGIN { srand( Seed % 10000000); fmt = "./jobProxy %s %3d\n"; }
{ printf (fmt, $1, 2 + int (14 * rand())); }
'
    seq 1 "${nJobs}" | awk -v Seed=$( date "+%N$$" ) "${Awk}"
}

runStreams () {

    typeset n nStreams="${1}"

    echo "Starting ${nStreams} Streams"
    for (( n = 1; n <= nStreams; ++n )); do
        idStream="${n}" bash -s &
        sleep 0.20
    done
    echo "Started all Streams"

    wait
    echo "All Streams Ended"
}

## Script Body Starts Here.

    date '+Time now %T.%N'
    echo 'Sleep until 00 seconds to make debug easier.'
    sleep $( date '+%S.%N' | awk '{ print 60 - $1; }' )

    makeJobs 6 | runStreams 3

Question 3

以下 perl 脚本不是在 bash while read 循环中多次运行git ls-treethen git log、date和，而是touch获取的输出git log --name-only HEAD并将提交日志中提到的任何文件的最新时间戳存储在名为的哈希中%files。它会忽略不存在的文件名。

然后，它构建一个名为的数组哈希（“HoA” - 请参阅参考资料man perldsc）%times，其中时间戳作为哈希键，值是包含具有该时间戳的文件名的匿名数组。这是一种优化，因此触摸函数只需为每个时间戳运行一次，而不是为每个文件名运行一次。

git log输出中的提交 ID、提交消息、作者姓名和空行将被忽略。

该脚本使用unqqbackslash()以下函数字符串::转义在每个文件名上正确处理打印带有嵌入制表符、换行符、双引号等的文件名的方式git log（即作为带有反斜杠转义代码/字符的双引号字符串）。

我预计它的运行速度至少比 bash 循环快几十倍。

#!/usr/bin/perl

use strict;
use Date::Parse;
use File::Touch;
use String::Escape qw(unqqbackslash);

my %files = ();
my %times = ();
my $t;

while (<>) {
  chomp;
  next if (m/^$|^\s+|^Author: |^commit /);

  if (s/^Date:\s+//) {
    $t = str2time($_);

  } else {
    my $f = unqqbackslash($_);
    next unless -e $f;   # don't create file if it doesn't exist

    if (!defined($files{$f}) || $files{$f} < $t) {
      $files{$f} = $t;
    }

  };
};

# build %files HoA with timestamps containing the
# files modified at that time.
foreach my $f (sort keys %files) {
  push @{ $times{$files{$f}} }, $f;
}

# now touch the files
foreach my $t (keys %times) {
  my $tch = File::Touch->new(mtime_only => 1, time => $t);
  $tch->touch(@{ $times{$t} });
};

该脚本使用日期::解析, 文件::触摸，和字符串::转义Perl 模块。

在 Debian 上，apt install libtimedate-perl libfile-touch-perl libstring-escape-perl.其他发行版可能也将它们打包。否则，请使用cpan.

用法示例，在包含几个垃圾文件（file、和file2）的 git 存储库中：

$ git log --date=format:'%Y-%m-%d %H:%M:%S' --pretty='%H  %ad %s' file*
d10c313abb71876cfa8ad420b10f166543ba1402  2021-06-16 14:49:24 updated file2
61799d2c956db37bf56b228da28038841c5cd07d  2021-06-16 13:38:58 added file1
                                                              & file2

$ touch file*
$ ls -l file*
-rw-r--r-- 1 cas cas  5 Jun 16 19:23 file1
-rw-r--r-- 1 cas cas 29 Jun 16 19:23 file2

$ git  log  --name-only HEAD file*  | ./process-git-log.pl 
$ ls -l file*
-rw-r--r-- 1 cas cas  5 Jun 16 13:38 file1
-rw-r--r-- 1 cas cas 29 Jun 16 14:49 file2

（非常轻微的伪造 - 我编辑了提交消息，以明确两个文件何时首次提交，然后 file2 被更改并再次提交。除此之外，它是直接从我的终端复制粘贴的）。

这是我的第二次尝试：我最初尝试使用Git::原始模块，但无法弄清楚如何获得给我一个列表仅有的在特定提交中修改的文件名。我确信有办法，但我已经放弃了。我只是不太了解其内部原理git。

Answer