因此我们建立了一个服务器(11.0-发布-p2) 托管大约 150-200 个 jail。该服务器有 24 个核心和 192GB 内存。使用 top 时,它没有显示出任何压力迹象 - 除了高负载。所有 jail 都驻留在 NFS 挂载上,每个 jail 在创建时都会挂载自己的目录。服务器在任何方面都不会感觉缓慢,它相当敏捷。唯一困扰我们的是高负载。
从顶部输出:
last pid: 71841; load averages: 320.13, 131.33, 79.28 up 27+17:45:03 10:37:48
5325 processes:1 running, 5324 sleeping
CPU: 4.4% user, 0.0% nice, 1.6% system, 0.4% interrupt, 93.6% idle
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Swap: 4096M Total, 4096M Free
可以看到,负载很高,内存有138G可用,CPU空闲94%。
systat -vmstat 的输出
3 users Load 92.59 105 73.97 Feb 1 10:39
Mem usage: 26%Phy 6%Kmem
Mem: KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 21491k 223884 120800k 555864 144668k count
All 22230k 836948 142997k 4351592 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt ioflt 3595 total
104 5k 13k 5848 20k 1362 127 1646 147 cow atkbd0 1
730 zfod 1 ata1 15
1.8%Sys 0.3%Intr 3.0%User 0.0%Nice 94.9%Idle ozfod ohci0 ohci
| | | | | | | | | | %ozfod ehci0 ohci
=>> daefr 107 cpu0:timer
dtbuf 622 prcfr 722 bce0 259
Namei Name-cache Dir-cache 3237762 desvn 2014 totfr 619 bce1 260
Calls hits % hits % 3237760 numvn react pcib7 263
41265 41201 100 2713450 frevn pdwak 21 mps0 264
1290 pdpgs ciss0 265
Disks da0 da1 cd0 pass0 pass1 pass2 intrn 74 cpu13:time
KB/t 13.33 14.76 0.00 0.00 0.00 0.00 24315624 wire 112 cpu4:timer
tps 10 17 0 0 0 0 3192008 act 147 cpu2:timer
MB/s 0.14 0.24 0.00 0.00 0.00 0.00 23921440 inact 54 cpu3:timer
%busy 0 0 0 0 0 0 cache 132 cpu5:timer
144669k free 52 cpu1:timer
921954 68 cpu19:time
99 cpu21:time
54 cpu20:time
59 cpu18:time
59 cpu22:time
82 cpu23:time
67 cpu12:time
68 cpu6:timer
79 cpu14:time
88 cpu15:time
111 cpu16:time
93 cpu17:time
49 cpu8:timer
251 cpu7:timer
102 cpu9:timer
176 cpu10:time
49 cpu11:time
据我所知,那里也没有什么特别奇怪的。当然,有一些中断,但谷歌搜索显示,我们在那里遇到的中断数量与其他人遇到的中断问题时遇到的中断数量相比微不足道,其他人遇到的中断数量大约为 350 000 个。
iostat -w 1
tty da0 da1 cd0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
1 571 14.51 11 0.15 14.56 11 0.15 0.00 0 0.00 1 0 1 0 99
0 231 10.29 90 0.90 11.26 102 1.12 0.00 0 0.00 3 0 1 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 1 0 96
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 7 0 1 0 92
0 79 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 2 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 6 0 2 0 93
0 77 13.63 128 1.71 11.97 123 1.44 0.00 0 0.00 2 0 2 0 96
0 79 36.00 1 0.04 14.86 7 0.10 0.00 0 0.00 2 0 1 0 97
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 76 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 80 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 97
0 75 9.98 117 1.15 18.43 129 2.32 0.00 0 0.00 3 0 1 0 96
0 81 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 96
vmstat -w 1
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
3 0 0 115G 138G 297 0 2 0 653 373 0 0 224 59 1405 1 1 99
2 0 0 115G 138G 75 0 0 0 2017 1368 118 109 2299 23370 18920 6 2 92
2 0 0 115G 138G 1397 0 2 0 2839 1434 0 0 2665 30985 23294 5 4 91
2 0 0 115G 138G 1113 0 0 0 666 1373 0 0 2222 23078 17157 5 2 93
1 0 0 115G 138G 7 0 0 0 597 1368 0 0 590 18529 10477 2 1 96
1 0 0 115G 138G 0 0 2 0 194 2773 83 81 1269 26734 19190 3 3 94
1 0 0 115G 138G 9 0 0 0 90 1404 0 0 833 18907 11455 2 2 96
2 0 0 115G 138G 13 0 0 0 1309 1374 0 0 3185 25773 20054 3 3 94
1 0 0 115G 138G 1419 0 0 0 2750 1369 0 0 3899 25403 23252 7 4 90
0 0 0 115G 138G 776 0 1 0 164 1368 75 58 837 26261 16368 3 3 94
1 0 0 115G 138G 2336 0 5 0 2562 1367 0 0 1337 23287 13288 3 3 94
0 0 0 115G 138G 560 0 0 0 1193 2785 0 0 608 27176 14512 5 5 90
1 0 0 115G 138G 0 0 2 0 249 1369 0 0 702 18533 10700 1 2 97
1 0 0 115G 138G 3290 0 0 0 2313 1369 91 96 1461 22049 14726 6 3 91
关于 NFS,我真的不知道如何查找那里的问题。但这里有一个输出
nfsstat -c
Client Info:
Rpc Counts:
Getattr Setattr Lookup Readlink Read Write Create Remove
44956931 1020943 93567574 167 23609403 879028 514647 665228
Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
36867 1387 1 24655 21955 6118822 0 26166205
Mknod Fsstat Fsinfo PathConf Commit
0 5489407 1 2270 830867
Rpc Info:
TimedOut Invalid X Replies Retries Requests
0 0 0 0 203906224
Cache Info:
Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses
-719986429 44956925 -1243965171 93531884 66678251 22460288 981123 879028
BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits Misses
144 167 14572148 5721030 5124486 1455 -1123294109 26165764
以及来自
nfsstat -w 1 -c
GtAttr Lookup Rdlink Read Write Rename Access Rddir
5 0 0 5 0 0 0 2
9 342 0 9 0 0 42 9
12 91 0 21 0 0 21 4
0 2 0 0 0 0 2 0
0 1 0 0 0 0 0 0
0 5 0 0 0 0 2 0
5 124 0 5 0 0 0 2
6 12 0 5 0 0 12 2
4 0 0 5 0 0 0 2
9 0 0 10 0 0 0 4
4 0 0 5 0 0 0 2
50 1 0 14 0 0 0 7
最后输出
系统状态监测
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 29.6
Interface Traffic Peak Total
lo0 in 34.285 KB/s 291.936 KB/s 69.263 GB
out 34.285 KB/s 291.936 KB/s 69.263 GB
bce1 in 792.808 KB/s 5.382 MB/s 707.266 GB
out 56.828 KB/s 238.912 KB/s 91.154 GB
bce0 in 21.711 KB/s 21.711 KB/s 17.338 GB
out 13.799 KB/s 287.402 KB/s 64.000 GB
根据请求的 dmesg:
[larsemil@prison01 ~]$ dmesg
Limiting open port RST response from 213 to 200 packets/sec
Limiting open port RST response from 2636 to 200 packets/sec
pid 22548 (php-fpm), uid 10000: exited on signal 11
pid 26938 (wkhtmltopdf), uid 10000: exited on signal 6 (core dumped)
[zone: pf states] PF states limit reached
Limiting icmp ping response from 9592 to 200 packets/sec
Limiting icmp ping response from 611 to 200 packets/sec
Limiting icmp ping response from 1792 to 200 packets/sec
Limiting icmp ping response from 2650 to 200 packets/sec
Limiting icmp ping response from 316 to 200 packets/sec
Limiting icmp ping response from 1758 to 200 packets/sec
Limiting icmp ping response from 2478 to 200 packets/sec
Limiting icmp ping response from 578 to 200 packets/sec
Limiting icmp ping response from 2028 to 200 packets/sec
Limiting icmp ping response from 3175 to 200 packets/sec
Limiting icmp ping response from 245 to 200 packets/sec
Limiting icmp ping response from 536 to 200 packets/sec
Limiting icmp ping response from 229 to 200 packets/sec
Limiting icmp ping response from 546 to 200 packets/sec
Limiting icmp ping response from 2239 to 200 packets/sec
Limiting icmp ping response from 3414 to 200 packets/sec
Limiting icmp ping response from 3033 to 200 packets/sec
Limiting icmp ping response from 1018 to 200 packets/sec
Limiting icmp ping response from 270 to 200 packets/sec
pid 34239 (php-fpm), uid 10000: exited on signal 11
pid 68427 (php-fpm), uid 10000: exited on signal 11
欢迎任何想法!
答案1
您能发布 dmesg 输出和来自 /var/log/messages 的任何日志消息吗?
我看到的是,你有一台 196GB 内存的机器,它试图在 3GB 内存中完成所有工作……它可能正在进行疯狂的交换。
内存:3116M 活动内存、23G 未活动内存、23G 有线内存、900M 缓冲内存、138G 可用内存 ARC:总计 10G、2612M MFU、4553M MRU、37M Anon、89M 标头内存、2742M 其他内存
可用内存不足。您需要使用机器中的内存。请发布 sysctl vfs.zfs.arc_max 的输出,点击此处查看针对 ARC 进行 zfs 调优
Jails 本身基本上不做任何事情。如果正在运行,jails 中的进程将显示在 top 中 - 看起来没有发生太多事情。
FreeBSD 顶部是不同的,LA 应该相对于核心数 (24) 进行读取。您的 LA 很高,但这只是因为某些东西无法获得所需的内存。
答案2
尝试:
sysctl kern.eventtimer.timer=HPET