Perl 的 `kill` 使用的是 `$! == Errno::EINTR` 意外

Perl 的 `kill` 使用的是 `$! == Errno::EINTR` 意外

我编写了一个网络守护进程,它分叉子进程来处理 TCP 连接。在SIGINT主进程上kill为每个子进程触发一个,以便清理并收集一些最终统计数据。

几乎在所有情况下都可以正常工作,并且子进程终止得非常快。然而,有时子进程拒绝在短超时(如 5 秒)内终止。

我不知道当时发生了什么,所以我添加了一些详细的输出来诊断这种情况。我发现使用netcat打开连接,然后暂停该netcat进程,有时造成效果。

当我能够重现效果时,调试输出为:

REST-server(cleanup_queue): deleting children
REST-server(cleanup_queue): deleting PID 23344 handling localhost:48114
child_delete: Killing child 23344
child_delete: killed child with PID 23344
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting up to 5 seconds for condition
_limited_wait(PID 23344 terminated): waiting 0.02 (of 5 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.04 (of 4.98 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.08 (of 4.94 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.16 (of 4.86 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.32 (of 4.7 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 0.64 (of 4.38 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 1.28 (of 3.74 remaining) seconds
(r1, r2) = (1, Interrupted system call)
_limited_wait(PID 23344 terminated): waiting 2.46 (of 2.46 remaining) seconds
(r1, r2) = (1, Interrupted system call)
child_delete: PID 23344 refused to terminate within 5s
failed to delete child PID 23344

在这种情况下等待的“条件”是此关闭的结果:

sub {
    my $r1 = kill(0, $child_pid);
    my $r2 = $!;
    print "(r1, r2) = ($r1, $r2)\n";
    $r1 != 1 && $r2 == Errno::ESRCH;
}

因此,预期的结果是主进程无法“杀死”PID,因为它不再存在(而不是因为“权限被拒绝”)。

然而,由于某些原因,我反复收到“系统调用中断”的消息。

主进程使用这样的信号处理程序:

$SIG{'INT'} = $SIG{'TERM'} = sub ($) {
    my $signal = 'SIG' . $_[0];
    my $me = "signal handler[$$, $signal]";

    print "$me: cleaning up\n"
        if ($verbose > 0);
    cleanup();
    print "$me: executing default action\n"
        if ($verbose > 1);
    $SIG{$_[0]} = 'DEFAULT';
    kill($_[0], $$);                    # execute default action
};

当分叉子进程时,我重置信号处理程序,如下所示:

sub child_create($)
{
    my ($child) = @_;
    my $pid;

    reaper(0);                          # disable for the child
    if ($pid = fork()) {                # parent
        reaper(1);                      # enable for the parent
    } elsif (defined($pid)) {           # child
        my ($child_fun, @child_param) = @$child;
        my $ret;

        # prevent double-cleanup
        $SIG{'INT'} = $SIG{'TERM'} = $SIG{'__DIE__'} = 'DEFAULT';
        $ret = $child_fun->(@child_param);
        exit($ret);                     # avoid returning from function call
    } else {                            # error
        print STDERR "child_create: fork(): $!\n";
    }
    return $pid;
}

reaper()刚刚的把手SIGCHLD

什么可能导致所看到的效果?子进程基本上执行while (defined(my $req = $conn->get_request)) {...}(using HTTP::Daemon),因此它们应该等待输入netcat

附加信息

以防万一,操作系统是在 VMware 上运行的 SLES12 SP5(使用 Perl 5.18.2)。

主服务器循环中的代码如下所示:

while (defined(my $conn = $daemon->accept) || $! == Errno::EINTR) {
    my $errno = $!;

    if ($quit_flag != 0) {
        last;
    }
    if ($errno == Errno::EINTR) {
        next;
    }
    #... handle $req->uri->path()
}

相关内容