F5 LTM 频繁使用 SIGKILL 终止进程

F5 LTM 频繁使用 SIGKILL 终止进程

我们有一台 BIP-IP 6400 LTM 设备,它以惊人的频率终止进程。CPU 使用率始终在 23% 左右,所以这不是问题。

以下是一个示例/var/log/ltm

Oct  7 08:21:55 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25338 exited with signal = 9
Oct  7 08:22:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25587 exited with signal = 9
Oct  7 08:22:34 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25793 exited with signal = 9
Oct  7 08:23:10 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26260 exited with signal = 9
Oct  7 08:23:36 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26584 exited with signal = 9
Oct  7 08:23:40 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26647 exited with signal = 9
Oct  7 08:23:45 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26699 exited with signal = 9
Oct  7 08:23:55 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26805 exited with signal = 9
Oct  7 08:25:36 local/pri-4600 info bigd[3471]: reap_child: child process PID = 28079 exited with signal = 9
Oct  7 08:27:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29286 exited with signal = 9
Oct  7 08:27:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29307 exited with signal = 9
Oct  7 08:27:56 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29793 exited with signal = 9
Oct  7 08:29:20 local/pri-4600 info bigd[3471]: reap_child: child process PID = 30851 exited with signal = 9
Oct  7 08:33:00 local/pri-4600 info bigd[3471]: reap_child: child process PID = 1122 exited with signal = 9
Oct  7 08:33:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 1299 exited with signal = 9
Oct  7 08:34:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2054 exited with signal = 9
Oct  7 08:35:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2784 exited with signal = 9
Oct  7 08:35:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2807 exited with signal = 9
Oct  7 08:35:35 local/pri-4600 info bigd[3471]: reap_child: child process PID = 3015 exited with signal = 9
Oct  7 08:36:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 3601 exited with signal = 9

这是正常的吗?如果不是,那么是什么原因导致这种情况发生的?

答案1

bigd 是 BIG-IP 上的监控守护进程,因此这似乎是正在使用的监视器崩溃了。您应该向支持部门提交案例并将您的 qkview 上传到 ihealth.f5.com。以下是与该错误消息相关的解决方案:

https://support.f5.com/kb/en-us/solutions/public/17000/000/sol17092.html

答案2

这是我们正在运行的 10.2.4 BIG-IP 软件中的一个已知错误。

来自 F5 支持:

...您遇到了一个内部跟踪的已知问题:错误 ID539130“bigd 在处理 SIGCHLD 时可能会死锁,从而导致 bigd 心跳失败和 SIGABRT”-=条件=- 运行很长时间并被监视器的下一次迭代杀死的外部监视器可能会导致 bigd 崩溃和核心,这会导致健康监测暂时失效。

解决方法是使用 更新该软件Hotfix-BIGIP-10.2.4-HF12-866.11-ENG

相关内容