我使用 slurm 向集群提交了 64 个 Java 任务。我要求每个任务使用 4 个 CPU,但怀疑每个进程只在一个 CPU 上运行。我这样做是myq
为了获取每个任务正在运行的节点
Running:
ID NAME PART. QOS CPU WALLTIME REMAIN NODES
14378773_13 firstrun batch normal 4 4-00:00:00 3-12:27:31 node1104
14378773_14 firstrun batch normal 4 4-00:00:00 3-12:55:20 node1163
14378773_15 firstrun batch normal 4 4-00:00:00 3-14:26:26 node1123
Pending:
ID NAME PART. QOS CPU WALLTIME EST.START REASON
14378773_[16-64%6] firstrun batch normal 4 4-00:00:00 N/A (Priority)
然后ssh
检查第一个
ssh node1104
top -u mfariasv -H
Threads: 1754 total, 13 running, 1740 sleeping, 0 stopped, 1 zombie
%Cpu(s): 55.9 us, 0.2 sy, 0.0 ni, 43.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 19665398+total, 15198475+free, 34527148 used, 10142080 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 15188001+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
41027 mfariasv 20 0 36.424g 0.012t 53720 R 99.9 6.5 692:18.40 java
41064 mfariasv 20 0 36.424g 0.012t 53720 S 28.7 6.5 224:21.85 java
41065 mfariasv 20 0 36.424g 0.012t 53720 S 28.7 6.5 224:20.59 java
41066 mfariasv 20 0 36.424g 0.012t 53720 S 28.7 6.5 224:21.12 java
58000 mfariasv 20 0 157900 3972 1484 R 1.0 0.0 0:00.58 top
41018 mfariasv 20 0 113124 1500 1236 S 0.0 0.0 0:00.00 slurm_script
41025 mfariasv 20 0 170856 7276 3300 S 0.0 0.0 0:00.02 python
41026 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.00 java
41028 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.45 java
41029 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.22 java
41030 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.18 java
41031 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.21 java
41032 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.12 java
41033 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.03 java
41034 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:19.78 java
41035 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.12 java
41036 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:19.96 java
41037 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.09 java
41038 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.36 java
41039 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.04 java
41040 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.14 java
41041 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.28 java
41042 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:19.73 java
41043 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.60 java
41044 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:19.83 java
41045 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:20.44 java
41046 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:07.42 java
41047 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.00 java
41048 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.00 java
41049 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.00 java
41050 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:04.13 java
41051 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:05.00 java
41052 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:10.56 java
41053 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:11.44 java
41054 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:07.49 java
41055 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:08.18 java
41056 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:07.18 java
41057 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:05.80 java
41058 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.58 java
41059 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.56 java
41060 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.61 java
41061 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.52 java
41062 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:00.00 java
41063 mfariasv 20 0 36.424g 0.012t 53720 S 0.0 6.5 0:04.86 java
57857 mfariasv 20 0 136068 2100 912 S 0.0 0.0 0:00.00 sshd
57858 mfariasv 20 0 113880 2040 1516 S 0.0 0.0 0:00.00 bash
正如我所料,该进程仅在一个 CPU 上运行,对吗?
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
41027 mfariasv 20 0 36.424g 0.012t 53720 R 99.9 6.5 692:18.40 java
当我在跑步1
时top
我得到
top - 11:29:17 up 55 days, 21:26, 1 user, load average: 17.37, 17.46, 17.43
Threads: 1064 total, 18 running, 1045 sleeping, 0 stopped, 1 zombie
%Cpu0 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 22.9 us, 0.0 sy, 0.0 ni, 77.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 22.9 us, 0.0 sy, 0.0 ni, 77.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 23.2 us, 0.0 sy, 0.0 ni, 76.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu25 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu27 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu28 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu29 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu30 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu31 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 19663889+total, 16113785+free, 29435768 used, 6065264 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 16099705+avail Mem