有很多工具可以用来监控磁盘 IO,比如 dstat。
有没有什么工具可以用来监控 DRAM IO?比如每秒从 DRAM 读取多少 MB 数据。
答案1
由于您使用的是 Intel CPU,因此您应该能够使用处理器计数器监视器,一款现已开源的英特尔软件。如果我没看错的话,在 Linux 上编译它只需要g++
和。make
在运行它之前,您需要确保该msr
模块已被加载(sudo modprobe msr
)或内置。
有了你的 CPU,你应该能够使用该pcm-memory.x
实用程序。我无法使用它,所以我不知道输出是什么样子的。
即使你的 CPU 不支持pcm-memory.x
,你仍然可以从 获得整体内存带宽统计信息pcm.x
。它看起来像这样:
$ sudo ./pcm.x -i=1 -nc
Processor Counter Monitor ($Format:%ci ID=%h$)
IBRS and IBPB supported : no
STIBP supported : no
Spec arch caps supported : no
Number of physical cores: 4
Number of logical cores: 8
Number of online logical cores: 8
Threads (logical cores) per physical core: 2
Num sockets: 1
Physical cores per socket: 4
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3600000000 Hz
Package thermal spec power: 65 Watt; Package minimum power: 0 Watt; Package maximum power: 0 Watt;
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf
Detected Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz "Intel(r) microarchitecture codename Kabylake" stepping 9 microcode level 0x5e
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 (read) cache misses
L2MISS: L2 (read) cache misses (including other core's L2 cache *hits*)
L3HIT : L3 (read) cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3MPI : number of L3 (read) cache misses per instruction
L2MPI : number of L2 (read) cache misses per instruction
READ : bytes read from main memory controller (in GBytes)
WRITE : bytes written to main memory controller (in GBytes)
IO : bytes read/written due to IO requests to memory controller (in GBytes); this may be an over estimate due to same-cache-line partial requests
TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
energy: Energy in Joules
Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP
---------------------------------------------------------------------------------------------------------------
SKT 0 0.02 1.05 0.02 0.39 402 K 1770 K 0.76 0.53 0.00 0.00 67
---------------------------------------------------------------------------------------------------------------
TOTAL * 0.02 1.05 0.02 0.39 402 K 1770 K 0.76 0.53 0.00 0.00 N/A
Instructions retired: 487 M ; Active cycles: 462 M ; Time (TSC): 3602 Mticks ; C0 (active,non-halted) core residency: 4.12 %
C1 core residency: 9.26 %; C3 core residency: 0.59 %; C6 core residency: 2.14 %; C7 core residency: 83.89 %;
C0 package residency: 36.94 %; C2 package residency: 63.06 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %; C8 package residency: 0.00 %; C9 package residency: 0.00 %; C10 package residency: 0.00 %;
┌───────────────────────────────────────────────────────────────────────────────┐
Core C-state distribution│0001111111667777777777777777777777777777777777777777777777777777777777777777777│
└───────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────────────┐
Package C-state distribution│00000000000000000000000000000022222222222222222222222222222222222222222222222222│
└────────────────────────────────────────────────────────────────────────────────┘
PHYSICAL CORE IPC : 2.11 => corresponds to 52.65 % utilization for cores in active state
Instructions per nominal CPU cycle: 0.03 => corresponds to 0.85 % core utilization over time interval
SMI count: 0
---------------------------------------------------------------------------------------------------------------
MEM (GB)->| READ | WRITE | IO | CPU energy |
---------------------------------------------------------------------------------------------------------------
SKT 0 0.24 0.03 0.00 1.88
---------------------------------------------------------------------------------------------------------------
Cleaning up
Zeroed uncore PMU registers
除非您指定-i=1
,否则输出将定期重复。如果您省略-nc
,您将获得每个核心的执行统计数据,而不仅仅是总数。
在底部,您可以看到内存统计信息。