更新

更新

我有一台在 raid1 上运行 NVME ssd 设备的服务器Ubuntu 20.04.1 LTS,硬盘运行速度太慢了!将 500MB 的 gzip 文件打开到 3.7GB 需要一段时间……远远超过了应有的时间。这是一个仅供我使用的开发服务器,因此即使我使用 MariaDB,加载 SQL 转储也需要大约 30 分钟,而当我尝试在家用电脑上本地加载它们时则需要几分钟,一切都很慢!即使升级 ubuntu 软件包也需要很长时间!

所以我收集了一些规格:

Linux Kernel: 5.4.0-42-generic
CPU:  Intel(R) Xeon(R) D-2141I CPU @ 2.20GHz
Memory: 32GB
two WDC CL SN720 SDAQNTW-512G-2000 hard drives with software raid1 (nvme ssd)

以及一些命令的信息

# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [linear] [multipath] [raid10]
md2 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
      523200 blocks [2/2] [UU]

md3 : active raid1 nvme1n1p3[1] nvme0n1p3[0]
      498530240 blocks [2/2] [UU]
      bitmap: 4/4 pages [16KB], 65536KB chunk

unused devices: <none> 

md3用作根分区,我对此进行了测试。

# lsblk -io KNAME,TYPE,SIZE,MODEL,MOUNTPOINT
KNAME     TYPE    SIZE MODEL                          MOUNTPOINT
loop0     loop     55M                                /snap/core18/1880
loop1     loop   70.6M                                /snap/lxd/16894
loop2     loop   29.9M                                /snap/snapd/8542
loop3     loop   70.6M                                /snap/lxd/16922
loop4     loop   55.3M                                /snap/core18/1885
loop5     loop   29.9M                                /snap/snapd/8790
md2       raid1   511M                                /boot
md2       raid1   511M                                /boot
md3       raid1 475.4G                                /
md3       raid1 475.4G                                /
nvme0n1   disk    477G WDC CL SN720 SDAQNTW-512G-2000
nvme0n1p1 part    511M                                /boot/efi
nvme0n1p2 part    511M
nvme0n1p3 part  475.4G
nvme0n1p4 part    511M                                [SWAP]
nvme1n1   disk    477G WDC CL SN720 SDAQNTW-512G-2000
nvme1n1p1 part    511M
nvme1n1p2 part    511M
nvme1n1p3 part  475.4G
nvme1n1p4 part    511M                                [SWAP]

# madam -detail /dev/md3
/dev/md3:
           Version : 0.90
     Creation Time : Thu Jul 30 13:49:54 2020
        Raid Level : raid1
        Array Size : 498530240 (475.44 GiB 510.49 GB)
     Used Dev Size : 498530240 (475.44 GiB 510.49 GB)
      Raid Devices : 2
     Total Devices : 2
   Preferred Minor : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Sep  8 13:37:54 2020
             State : clean
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              UUID : 9dd3cf94:cfc5c935:a4d2adc2:26fd5302
            Events : 0.13

    Number   Major   Minor   RaidDevice State
       0     259        3        0      active sync   /dev/nvme0n1p3
       1     259        8        1      active sync   /dev/nvme1n1p3

fio我尝试使用以下命令测试驱动器的速度

fio --name=randwrite --ioengine=libaio --iodepth=64 --rw=randwrite     --bs=64k --direct=1 --size=32G --numjobs=8 --runtime=240     --group_reporting

结果是:

Jobs: 8 (f=8): [w(8)][100.0%][w=776MiB/s][w=12.4k IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=8): err= 0: pid=1028157: Tue Sep  8 13:09:05 2020
  write: IOPS=11.1k, BW=692MiB/s (726MB/s)(162GiB/240041msec); 0 zone resets
    slat (usec): min=127, max=567387, avg=385.31, stdev=5093.87
    clat (usec): min=2, max=1044.7k, avg=45818.20, stdev=55462.71
     lat (usec): min=268, max=1045.0k, avg=46206.45, stdev=55680.36
    clat percentiles (msec):
     |  1.00th=[   10],  5.00th=[   22], 10.00th=[   23], 20.00th=[   26],
     | 30.00th=[   29], 40.00th=[   33], 50.00th=[   36], 60.00th=[   41],
     | 70.00th=[   46], 80.00th=[   53], 90.00th=[   64], 95.00th=[   75],
     | 99.00th=[  443], 99.50th=[  493], 99.90th=[  550], 99.95th=[  567],
     | 99.99th=[  600]
   bw (  KiB/s): min=48768, max=1394246, per=99.97%, avg=708325.21, stdev=25832.90, samples=3840
   iops        : min=  762, max=21784, avg=11066.98, stdev=403.63, samples=3840
  lat (usec)   : 4=0.01%, 10=0.01%, 50=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.02%, 1000=0.02%
  lat (msec)   : 2=0.08%, 4=0.18%, 10=0.80%, 20=1.91%, 50=74.09%
  lat (msec)   : 100=20.72%, 250=0.61%, 500=1.14%, 750=0.41%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=9.10%, sys=41.41%, ctx=1203665, majf=0, minf=95
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,2657370,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=692MiB/s (726MB/s), 692MiB/s-692MiB/s (726MB/s-726MB/s), io=162GiB (174GB), run=240041-240041msec

Disk stats (read/write):
    md3: ios=0/3319927, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/2692572, aggrmerge=0/634101, aggrticks=0/79485190, aggrin_queue=74460696, aggrutil=94.36%
  nvme0n1: ios=0/2692573, merge=0/634101, ticks=0/83651179, in_queue=78562212, util=94.36%
  nvme1n1: ios=0/2692572, merge=0/634102, ticks=0/75319202, in_queue=70359180, util=94.04%

我尝试谷歌搜索,发现人们说如果我将其从无改为Intent BitmapInternal,速度会加快,但在更改并运行 fio 之后,速度变得有点慢...也许我需要等一会儿?我不知道。

所以我几乎迷失了...我真的不知道如何从这里继续调查,所以真的...任何有关这个问题的信息都将不胜感激。当然,我也监控了 CPU 以确保它是相关的,但看起来 CPU 根本没有被大量使用。

谢谢你!

更新

IRC 上有人问我是不是writethrough不小心设置了,我试着用 google 搜索了一下,找到了这个https://www.kernel.org/doc/html/latest/driver-api/md/raid5-cache.html

它谈论的是 raid4/5/6 而我使用 raid1 所以可能不相关,而且sys/block/md3/md/journal_mode本文档中所述的文件也不存在。

更新 2

找到了一种测试缓存读写的方法

# hdparm -tT /dev/md3

/dev/md3:
 Timing cached reads:   1006 MB in  1.99 seconds = 504.40 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 664 MB in  3.01 seconds = 220.88 MB/sec

我希望这些信息也有用

相关内容