在多线程进程上获取性能统计信息的两种方法中哪一种是正确的？

2024-6-5 • tag-icon

我正在研究多线程数据库服务器的性能。有一个特定的工作负载在特定机器上运行大约需要 61 秒。当我针对工作负载运行 perf 时，数据库进程的 pid 为 79894。

除了数据库服务器中的软件线程之外，还有许多与 Linux 相关的线程，这些线程通常在空闲系统上处于休眠状态，但在我的工作负载运行时变得活跃。因此我想使用 perf 的 -a 选项以及 -p 选项。

我以两种方式运行 perf，每种方式都会得到一些不同的结果。

我在一个窗口中运行以下 perf 命令的第一种方法

perf stat -p 2413 -a

并立即在另一个窗口中运行数据库工作负载。当数据库工作负载完成时，我控制 C 退出 perf 并得到以下结果

    Performance counter stats for process id '79894':

              1,842,359.55 msec cpu-clock                 #   30.061 CPUs utilized          
                 3,798,673      context-switches          #    0.002 M/sec                  
                   153,995      cpu-migrations            #    0.084 K/sec                  
                16,038,992      page-faults               #    0.009 M/sec                  
         4,939,131,149,436      cycles                    #    2.681 GHz                    
         3,924,220,386,428      stalled-cycles-frontend   #   79.45% frontend cycles idle   
         3,418,137,943,654      instructions              #    0.69  insn per cycle         
                                                          #    1.15  stalled cycles per insn
           402,389,588,237      branches                  #  218.410 M/sec                 
             5,137,510,170      branch-misses             #    1.28% of all branches  


     61.28834199 seconds time elapsed

第二种方法是运行

perf stat  -a  sleep 61

并立即在另一个窗口中运行数据库工作负载。 61 秒后，perf 和工作负载均完成，perf 产生以下结果

 Performance counter stats for 'system wide':

      4,880,317.67 msec cpu-clock                 #   79.964 CPUs utilized          
         8,274,996      context-switches          #    0.002 M/sec                  
           202,832      cpu-migrations            #    0.042 K/sec                  
        14,605,246      page-faults               #    0.003 M/sec                  
 5,022,298,186,711      cycles                    #    1.029 GHz                    
 7,599,517,323,727      stalled-cycles-frontend   #  151.32% frontend cycles idle   
 3,421,512,233,294      instructions              #    0.68  insn per cycle         
                                                  #    2.22  stalled cycles per insn
   402,726,487,019      branches                  #   82.521 M/sec                  
     5,124,543,680      branch-misses             #    1.27% of all branches        

      61.031494851 seconds time elapsed

因为我在两个版本中都使用了 -a，所以我预计会得到大致相同的结果。

但随着睡眠，

cpu-clock is 2.5 times what you get with the -p version, 
context-switches are double what you get with the -p version  
and the other values are more or less the same

所以2个问题，

    (1) which set of results do I believe?
and 
    (2) how can there be more stalled-cycles-frontend than cycles in the sleep version?

相关内容