我有一台在 raid1 上运行 NVME ssd 设备的服务器Ubuntu 20.04.1 LTS
,硬盘运行速度太慢了!将 500MB 的 gzip 文件打开到 3.7GB 需要一段时间……远远超过了应有的时间。这是一个仅供我使用的开发服务器,因此即使我使用 MariaDB,加载 SQL 转储也需要大约 30 分钟,而当我尝试在家用电脑上本地加载它们时则需要几分钟,一切都很慢!即使升级 ubuntu 软件包也需要很长时间!
所以我收集了一些规格:
Linux Kernel: 5.4.0-42-generic
CPU: Intel(R) Xeon(R) D-2141I CPU @ 2.20GHz
Memory: 32GB
two WDC CL SN720 SDAQNTW-512G-2000 hard drives with software raid1 (nvme ssd)
以及一些命令的信息
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [linear] [multipath] [raid10]
md2 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
523200 blocks [2/2] [UU]
md3 : active raid1 nvme1n1p3[1] nvme0n1p3[0]
498530240 blocks [2/2] [UU]
bitmap: 4/4 pages [16KB], 65536KB chunk
unused devices: <none>
md3
用作根分区,我对此进行了测试。
# lsblk -io KNAME,TYPE,SIZE,MODEL,MOUNTPOINT
KNAME TYPE SIZE MODEL MOUNTPOINT
loop0 loop 55M /snap/core18/1880
loop1 loop 70.6M /snap/lxd/16894
loop2 loop 29.9M /snap/snapd/8542
loop3 loop 70.6M /snap/lxd/16922
loop4 loop 55.3M /snap/core18/1885
loop5 loop 29.9M /snap/snapd/8790
md2 raid1 511M /boot
md2 raid1 511M /boot
md3 raid1 475.4G /
md3 raid1 475.4G /
nvme0n1 disk 477G WDC CL SN720 SDAQNTW-512G-2000
nvme0n1p1 part 511M /boot/efi
nvme0n1p2 part 511M
nvme0n1p3 part 475.4G
nvme0n1p4 part 511M [SWAP]
nvme1n1 disk 477G WDC CL SN720 SDAQNTW-512G-2000
nvme1n1p1 part 511M
nvme1n1p2 part 511M
nvme1n1p3 part 475.4G
nvme1n1p4 part 511M [SWAP]
和
# madam -detail /dev/md3
/dev/md3:
Version : 0.90
Creation Time : Thu Jul 30 13:49:54 2020
Raid Level : raid1
Array Size : 498530240 (475.44 GiB 510.49 GB)
Used Dev Size : 498530240 (475.44 GiB 510.49 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Sep 8 13:37:54 2020
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
UUID : 9dd3cf94:cfc5c935:a4d2adc2:26fd5302
Events : 0.13
Number Major Minor RaidDevice State
0 259 3 0 active sync /dev/nvme0n1p3
1 259 8 1 active sync /dev/nvme1n1p3
fio
我尝试使用以下命令测试驱动器的速度
fio --name=randwrite --ioengine=libaio --iodepth=64 --rw=randwrite --bs=64k --direct=1 --size=32G --numjobs=8 --runtime=240 --group_reporting
结果是:
Jobs: 8 (f=8): [w(8)][100.0%][w=776MiB/s][w=12.4k IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=8): err= 0: pid=1028157: Tue Sep 8 13:09:05 2020
write: IOPS=11.1k, BW=692MiB/s (726MB/s)(162GiB/240041msec); 0 zone resets
slat (usec): min=127, max=567387, avg=385.31, stdev=5093.87
clat (usec): min=2, max=1044.7k, avg=45818.20, stdev=55462.71
lat (usec): min=268, max=1045.0k, avg=46206.45, stdev=55680.36
clat percentiles (msec):
| 1.00th=[ 10], 5.00th=[ 22], 10.00th=[ 23], 20.00th=[ 26],
| 30.00th=[ 29], 40.00th=[ 33], 50.00th=[ 36], 60.00th=[ 41],
| 70.00th=[ 46], 80.00th=[ 53], 90.00th=[ 64], 95.00th=[ 75],
| 99.00th=[ 443], 99.50th=[ 493], 99.90th=[ 550], 99.95th=[ 567],
| 99.99th=[ 600]
bw ( KiB/s): min=48768, max=1394246, per=99.97%, avg=708325.21, stdev=25832.90, samples=3840
iops : min= 762, max=21784, avg=11066.98, stdev=403.63, samples=3840
lat (usec) : 4=0.01%, 10=0.01%, 50=0.01%, 250=0.01%, 500=0.01%
lat (usec) : 750=0.02%, 1000=0.02%
lat (msec) : 2=0.08%, 4=0.18%, 10=0.80%, 20=1.91%, 50=74.09%
lat (msec) : 100=20.72%, 250=0.61%, 500=1.14%, 750=0.41%, 1000=0.01%
lat (msec) : 2000=0.01%
cpu : usr=9.10%, sys=41.41%, ctx=1203665, majf=0, minf=95
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,2657370,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=692MiB/s (726MB/s), 692MiB/s-692MiB/s (726MB/s-726MB/s), io=162GiB (174GB), run=240041-240041msec
Disk stats (read/write):
md3: ios=0/3319927, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/2692572, aggrmerge=0/634101, aggrticks=0/79485190, aggrin_queue=74460696, aggrutil=94.36%
nvme0n1: ios=0/2692573, merge=0/634101, ticks=0/83651179, in_queue=78562212, util=94.36%
nvme1n1: ios=0/2692572, merge=0/634102, ticks=0/75319202, in_queue=70359180, util=94.04%
我尝试谷歌搜索,发现人们说如果我将其从无改为Intent Bitmap
无Internal
,速度会加快,但在更改并运行 fio 之后,速度变得有点慢...也许我需要等一会儿?我不知道。
所以我几乎迷失了...我真的不知道如何从这里继续调查,所以真的...任何有关这个问题的信息都将不胜感激。当然,我也监控了 CPU 以确保它是相关的,但看起来 CPU 根本没有被大量使用。
谢谢你!
更新
IRC 上有人问我是不是writethrough
不小心设置了,我试着用 google 搜索了一下,找到了这个https://www.kernel.org/doc/html/latest/driver-api/md/raid5-cache.html
它谈论的是 raid4/5/6 而我使用 raid1 所以可能不相关,而且sys/block/md3/md/journal_mode
本文档中所述的文件也不存在。
更新 2
找到了一种测试缓存读写的方法
# hdparm -tT /dev/md3
/dev/md3:
Timing cached reads: 1006 MB in 1.99 seconds = 504.40 MB/sec
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Timing buffered disk reads: 664 MB in 3.01 seconds = 220.88 MB/sec
我希望这些信息也有用