我在 下运行了一个 shell 管道perf stat
,用于taskset 0x1
将整个管道固定到单个 CPU。我知道taskset 0x1
有效果,因为它使管道的吞吐量增加了一倍多。但是,perf stat
显示管道的不同进程之间有 0 次上下文切换。
那么perf stat
上下文切换到底意味着什么呢?
我想我对管道中各个任务的上下文切换数量感兴趣。有更好的方法来衡量吗?
这是在比较的背景下dd bs=1M </dev/zero
进行的dd bs=1M </dev/zero | dd bs=1M >/dev/null
。如果我可以根据需要测量上下文切换,我认为这将有助于量化为什么第一个版本比第二个版本“高效”几倍。
$ rpm -q perf
perf-4.15.0-300.fc27.x86_64
$ uname -r
4.15.17-300.fc27.x86_64
$ perf stat taskset 0x1 sh -c 'dd bs=1M </dev/zero | dd bs=1M >/dev/null'
^C18366+0 records in
18366+0 records out
19258146816 bytes (19 GB, 18 GiB) copied, 5.0566 s, 3.8 GB/s
Performance counter stats for 'taskset 0x1 sh -c dd if=/dev/zero bs=1M | dd bs=1M of=/dev/null':
5059.273255 task-clock:u (msec) # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
414 page-faults:u # 0.082 K/sec
36,915,934 cycles:u # 0.007 GHz
9,511,905 instructions:u # 0.26 insn per cycle
2,480,746 branches:u # 0.490 M/sec
188,295 branch-misses:u # 7.59% of all branches
5.061473119 seconds time elapsed
$ perf stat sh -c 'dd bs=1M </dev/zero | dd bs=1M >/dev/null'
^C6637+0 records in
6636+0 records out
6958350336 bytes (7.0 GB, 6.5 GiB) copied, 4.04907 s, 1.7 GB/s
6636+0 records in
6636+0 records out
6958350336 bytes (7.0 GB, 6.5 GiB) copied, 4.0492 s, 1.7 GB/s
sh: Interrupt
Performance counter stats for 'sh -c dd if=/dev/zero bs=1M | dd bs=1M of=/dev/null':
3560.269345 task-clock:u (msec) # 0.878 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
355 page-faults:u # 0.100 K/sec
32,302,387 cycles:u # 0.009 GHz
4,823,855 instructions:u # 0.15 insn per cycle
1,167,126 branches:u # 0.328 M/sec
88,982 branch-misses:u # 7.62% of all branches
4.052844128 seconds time elapsed
答案1
perf 默默地无法计算上下文切换,因为您不是 root。
(Linux 有 64k 管道缓冲区。在任何一种情况下,您都可以看到每 64k 传输有非常接近 2 个上下文切换。不太确定它是如何工作的,但我怀疑它只是计算上下文切换离开从dd
,到另一个dd
,或到该 cpu 的空闲任务)。
$ sudo perf stat taskset 0x1 sh -c 'dd bs=1M </dev/zero|dd bs=1M >/dev/null'
^C14508+0 records in
14507+0 records out
15211692032 bytes (15 GB, 14 GiB) copied, 3.87098 s, 3.9 GB/s
14508+0 records in
14508+0 records out
15212740608 bytes (15 GB, 14 GiB) copied, 3.87044 s, 3.9 GB/s
taskset: Interrupt
Performance counter stats for 'taskset 0x1 sh -c dd bs=1M </dev/zero|dd bs=1M >/dev/null':
3872.597645 task-clock (msec) # 1.000 CPUs utilized
464,325 context-switches # 0.120 M/sec
0 cpu-migrations # 0.000 K/sec
928 page-faults # 0.240 K/sec
11,099,016,844 cycles # 2.866 GHz
13,765,220,898 instructions # 1.24 insn per cycle
3,053,464,009 branches # 788.480 M/sec
15,462,959 branch-misses # 0.51% of all branches
3.874121023 seconds time elapsed
$ echo $((15212740608 / 464325))
32763
$ sudo perf stat sh -c 'dd bs=1M </dev/zero|dd bs=1M >/dev/null'
^C7031+0 records in
7031+0 records out
7032+0 records in
7031+0 records out
7372537856 bytes (7.4 GB, 6.9 GiB) copied, 4.27436 s, 1.7 GB/s7372537856 bytes (7.4 GB, 6.9 GiB) copied, 4.27414 s, 1.7 GB/s
sh: Interrupt
Performance counter stats for 'sh -c dd bs=1M </dev/zero|dd bs=1M >/dev/null':
3736.056509 task-clock (msec) # 0.873 CPUs utilized
218,047 context-switches # 0.058 M/sec
206 cpu-migrations # 0.055 K/sec
877 page-faults # 0.235 K/sec
8,328,413,541 cycles # 2.229 GHz
7,617,859,285 instructions # 0.91 insn per cycle
1,671,904,009 branches # 447.505 M/sec
13,827,669 branch-misses # 0.83% of all branches
4.277591869 seconds time elapsed
$ echo $((7372537856 / 218047))
33811