/proc/[PID]/stat 是否显示有关子进程的累积 CPU 统计信息

Question

结论：
如果您不想阅读整个解释，只需阅读以下内容：
是的，/proc/[PID]/stat 中包含的值允许确定进程及其子进程使用的 CPU 时间量。
但是，您不能将其用于实时监控，因为仅当子进程终止时才会更新子进程 CPU 时间的值。

解释：
根据man time时间返回以下统计数据：

这些统计信息包括 (i) 调用和终止之间经过的实时时间，(ii) 用户 CPU 时间（times(2) 返回的 struct tms 中 tms_utime 和 tms_cutime 值的总和），以及 (iii)系统 CPU 时间（times(2) 返回的 tms 结构体中 tms_stime 和 tms_cstime 值的总和）。

如果有人阅读，man times可以了解到该结构定义为：

struct tms {
   clock_t tms_utime;  /* user time */
   clock_t tms_stime;  /* system time */
   clock_t tms_cutime; /* user time of children */
   clock_t tms_cstime; /* system time of children */
};

这意味着该命令返回进程及其所有子进程的累积用户和系统 CPU 时间。
现在我们需要知道我们可以从中提取什么/proc。在man procin 部分中/proc/[PID]/stat，您可以提取以下信息：

(14) utime %lu
该进程在用户模式下调度的时间量，以时钟周期为单位（除以 sysconf(_SC_CLK_TCK)）。这包括来宾时间 guest_time（运行虚拟 CPU 所花费的时间，请参见下文），以便不知道来宾时间字段的应用程序不会在计算中丢失该时间。
(15) stime %lu
该进程在内核模式下调度的时间量，以时钟周期为单位（除以 sysconf(_SC_CLK_TCK)）。
(16) cutime %ld
此进程的等待子进程已在用户模式下安排的时间量，以时钟周期为单位（除以 sysconf(_SC_CLK_TCK)）。（另请参见 times(2)。）这包括 guest 时间、cguest_time（运行虚拟 CPU 所花费的时间，请参见下文）。
(17) cstime %ld
该进程的等待子进程已在内核模式下调度的时间量，以时钟周期为单位（除以 sysconf(_SC_CLK_TCK)）。

所以基本上这个/proc/[PID]/stat文件包含了 time 用来确定 CPU 时间（以秒为单位）的值

凭借这些知识，我尝试像这样运行我的脚本time load.sh，并添加脚本的末尾，cat /proc/$$/stat结果如下：

9398 (load.sh) S 5379 9398 5379 34817 9398 4194304 1325449 7562836 0 0 192 520 3964 1165 20 0 1 0 814903 14422016 1154 18446744073709551615 4194304 5242124 140726473818336 0 0 0 65536 4 65538 1 0 0 17 3 0 0 818155 0 0 7341384 7388228 9928704 140726473827029 140726473827049 140726473827049 140726473830382 0

命令的输出time：

real    0m38,783s
user    0m41,576s
sys     0m16,866s

根据man proc我们需要查看第 14、15、16 和 17 列：192 520 3964 1165因此，如果我们总结进程及其子进程在用户/系统 cpu 上花费的时间。

192+3964 = 4156  <=>  user 0m41,576s
520+1165 = 1685  <=>  sys  0m16,866s

等等，CPU 时间并不完全是累积的，但您可以非常准确地（厘秒）计算您的程序及其子进程使用的 CPU 时间/proc/[PID]/stat。

编辑：
经过进一步测试并与人们交谈后，我终于得到了答案，我运行了一个仅包含以下内容的脚本：

#!/bin/bash
sleep 5
time stress --cpu 4 -t 60s --vm-hang 15
sleep 5
cat /proc/$$/stat | cut -d ' ' -f 14-17
exit

/proc/$$/stat并同时使用watch来监控指标。只要子进程未完成，计数器就不会更新。当stress结束时，中显示的值/proc/$$/stat将被更新，并以time命令和的第 14 至 17 列之间类似的结果结束/proc。

旧编辑 ~~我以为一切都结束了，但在做了更多研究后，我尝试使用命令进行相同的操作stress~~

time stress --cpu 4 -t 60s stress: info: [18598] dispatching hogs: 4 cpu, 0 io, 0 vm, 0 hdd stress: info: [18598] successful run completed in 60s real 1m0,003s user 3m53,663s sys 0m0,349s

在执行过程中，我每秒观察 2 次命令的结果：

cat /proc/11223/stat | cut -d ' ' -f 14-17 0 0 0 0

~~虽然ps faux | grep stress会给我这个特定的 PID 作为四个stress线程的父亲。~~

Answer 1