QEMU 的整体 ceph 性能问题

QEMU 的整体 ceph 性能问题

我的 ceph 集群上的 QEMU KVM 遇到一些性能问题。该集群有 4 个节点,每个节点配备 4x1TB 驱动器、48/64GB RAM、Intel Xeon 和 AMD Opterons。它们通过配置为一个绑定接口的 3x1 GBit 接口互连。现在整体网络流量非常高。有时会出现 IO 阻塞的情况,但我不知道具体原因。 OSD和KVM主机配备了Ubuntu 14.04 LTS和内核3.13.0。是不是有一个开关我忘记扳动了?!也许你可以帮我解决这个问题,因为我已经无计可施了。

IO 被阻止的日志片段:

2015-11-10 08:03:52.597054 mon.0 10.14.0.6:6789/0 546966 : cluster [INF] HEALTH_WARN; 1 requests are blocked > 32 sec
2015-11-10 08:04:41.993675 osd.13 10.14.0.76:6814/5175 106 : cluster [WRN] 30 slow requests, 30 included below; oldest blocked for > 30.207798 secs
2015-11-10 08:04:42.993975 osd.13 10.14.0.76:6814/5175 112 : cluster [WRN] 32 slow requests, 27 included below; oldest blocked for > 31.208280 secs
2015-11-10 08:04:43.994367 osd.13 10.14.0.76:6814/5175 118 : cluster [WRN] 35 slow requests, 25 included below; oldest blocked for > 32.208673 secs
2015-11-10 08:04:44.994712 osd.13 10.14.0.76:6814/5175 124 : cluster [WRN] 25 slow requests, 16 included below; oldest blocked for > 33.205598 secs
2015-11-10 08:04:45.995052 osd.13 10.14.0.76:6814/5175 130 : cluster [WRN] 26 slow requests, 15 included below; oldest blocked for > 34.124413 secs
2015-11-10 08:04:46.995360 osd.13 10.14.0.76:6814/5175 136 : cluster [WRN] 24 slow requests, 11 included below; oldest blocked for > 35.124517 secs
2015-11-10 08:04:47.995689 osd.13 10.14.0.76:6814/5175 142 : cluster [WRN] 22 slow requests, 6 included below; oldest blocked for > 36.124712 secs
2015-11-10 08:04:48.996059 osd.13 10.14.0.76:6814/5175 148 : cluster [WRN] 9 slow requests, 1 included below; oldest blocked for > 37.122843 secs
2015-11-10 08:05:05.238556 osd.13 10.14.0.76:6814/5175 150 : cluster [WRN] 12 slow requests, 3 included below; oldest blocked for > 53.365283 secs
2015-11-10 08:05:09.683333 osd.13 10.14.0.76:6814/5175 154 : cluster [WRN] 16 slow requests, 4 included below; oldest blocked for > 57.809976 secs
2015-11-10 08:05:11.895482 osd.13 10.14.0.76:6814/5175 159 : cluster [WRN] 18 slow requests, 11 included below; oldest blocked for > 60.022206 secs
2015-11-10 08:05:13.730638 osd.13 10.14.0.76:6814/5175 165 : cluster [WRN] 21 slow requests, 8 included below; oldest blocked for > 61.857323 secs
2015-11-10 08:05:14.731015 osd.13 10.14.0.76:6814/5175 171 : cluster [WRN] 24 slow requests, 6 included below; oldest blocked for > 62.857742 secs
2015-11-10 08:05:15.731261 osd.13 10.14.0.76:6814/5175 177 : cluster [WRN] 35 slow requests, 12 included below; oldest blocked for > 63.857998 secs
2015-11-10 08:05:17.028076 osd.13 10.14.0.76:6814/5175 183 : cluster [WRN] 43 slow requests, 15 included below; oldest blocked for > 65.154773 secs
2015-11-10 08:05:18.127205 osd.13 10.14.0.76:6814/5175 189 : cluster [WRN] 45 slow requests, 12 included below; oldest blocked for > 66.253932 secs
2015-11-10 08:05:19.127468 osd.13 10.14.0.76:6814/5175 195 : cluster [WRN] 48 slow requests, 14 included below; oldest blocked for > 67.254104 secs
2015-11-10 08:05:20.127937 osd.13 10.14.0.76:6814/5175 201 : cluster [WRN] 52 slow requests, 14 included below; oldest blocked for > 68.254581 secs
2015-11-10 08:05:22.065629 osd.13 10.14.0.76:6814/5175 207 : cluster [WRN] 53 slow requests, 14 included below; oldest blocked for > 70.192250 secs
2015-11-10 08:05:23.065965 osd.13 10.14.0.76:6814/5175 213 : cluster [WRN] 57 slow requests, 13 included below; oldest blocked for > 71.192553 secs
2015-11-10 08:05:24.066355 osd.13 10.14.0.76:6814/5175 219 : cluster [WRN] 58 slow requests, 9 included below; oldest blocked for > 72.192932 secs
2015-11-10 08:05:25.066731 osd.13 10.14.0.76:6814/5175 225 : cluster [WRN] 61 slow requests, 7 included below; oldest blocked for > 73.193356 secs
2015-11-10 08:05:26.067590 osd.13 10.14.0.76:6814/5175 231 : cluster [WRN] 62 slow requests, 3 included below; oldest blocked for > 74.193947 secs
2015-11-10 08:05:27.067844 osd.13 10.14.0.76:6814/5175 235 : cluster [WRN] 63 slow requests, 1 included below; oldest blocked for > 75.194501 secs
2015-11-10 08:05:32.306675 osd.13 10.14.0.76:6814/5175 237 : cluster [WRN] 59 slow requests, 1 included below; oldest blocked for > 80.433195 secs
2015-11-10 09:13:46.210699 osd.2 10.14.0.75:6804/29163 46 : cluster [WRN] 34 slow requests, 34 included below; oldest blocked for > 30.810297 secs
2015-11-10 09:13:47.211462 osd.2 10.14.0.75:6804/29163 52 : cluster [WRN] 38 slow requests, 33 included below; oldest blocked for > 31.811420 secs
2015-11-10 09:13:48.211718 osd.2 10.14.0.75:6804/29163 58 : cluster [WRN] 40 slow requests, 30 included below; oldest blocked for > 32.811678 secs
2015-11-10 09:13:49.212002 osd.2 10.14.0.75:6804/29163 64 : cluster [WRN] 43 slow requests, 28 included below; oldest blocked for > 33.811957 secs
2015-11-10 09:13:50.213554 osd.2 10.14.0.75:6804/29163 70 : cluster [WRN] 45 slow requests, 25 included below; oldest blocked for > 34.812999 secs
2015-11-10 09:13:51.214046 osd.2 10.14.0.75:6804/29163 76 : cluster [WRN] 50 slow requests, 25 included below; oldest blocked for > 35.813991 secs
2015-11-10 09:13:52.215101 osd.2 10.14.0.75:6804/29163 82 : cluster [WRN] 49 slow requests, 21 included below; oldest blocked for > 36.813431 secs
2015-11-10 09:13:53.215519 osd.2 10.14.0.75:6804/29163 88 : cluster [WRN] 43 slow requests, 19 included below; oldest blocked for > 37.810298 secs
2015-11-10 09:13:54.215797 osd.2 10.14.0.75:6804/29163 94 : cluster [WRN] 19 slow requests, 7 included below; oldest blocked for > 37.922869 secs
2015-11-10 09:13:55.216838 osd.2 10.14.0.75:6804/29163 100 : cluster [WRN] 6 slow requests, 1 included below; oldest blocked for > 37.592385 secs
2015-11-10 09:13:56.217302 osd.2 10.14.0.75:6804/29163 102 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.036856 secs
2015-11-10 10:18:00.293677 osd.0 10.14.0.75:6800/28850 109 : cluster [WRN] 5 slow requests, 5 included below; oldest blocked for > 30.137196 secs
2015-11-10 10:18:02.295197 osd.0 10.14.0.75:6800/28850 115 : cluster [WRN] 3 slow requests, 3 included below; oldest blocked for > 30.225206 secs
2015-11-10 10:18:03.296209 osd.0 10.14.0.75:6800/28850 119 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.640530 secs

这是我们暂时的 ceph.conf:

[global]
fsid = xxx
mon_initial_members = mon1 mon2 mon3
mon_host = 10.14.0.6
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd pool default size = 3
public network = 10.14.0.0/24
cluster network = 10.14.0.0/24
rbd default format = 2

[osd]
osd journal size = 10240
osd recovery max active = 1
osd max backfills = 1
filestore max sync interval = 30 # just for testing
filestore min sync interval = 29 # no impact detectable

这是 osd 树:

ID WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 14.23999 root default                                     
-6  3.56000     host host1                                     
 8  0.89000         osd.8       up  1.00000          1.00000 
 9  0.89000         osd.9       up  1.00000          1.00000 
10  0.89000         osd.10      up  1.00000          1.00000 
11  0.89000         osd.11      up  1.00000          1.00000 
-2  3.56000     host host2                                     
 2  0.89000         osd.2       up  1.00000          1.00000 
 5  0.89000         osd.5       up  1.00000          1.00000 
 7  0.89000         osd.7       up  1.00000          1.00000 
 0  0.89000         osd.0       up  0.79143          1.00000 
-4  3.56000     host host3                                     
12  0.89000         osd.12      up  1.00000          1.00000 
13  0.89000         osd.13      up  1.00000          1.00000 
14  0.89000         osd.14      up  1.00000          1.00000 
15  0.89000         osd.15      up  1.00000          1.00000 
-3  3.56000     host host4                                     
 1  0.89000         osd.1       up  1.00000          1.00000 
 3  0.89000         osd.3       up  1.00000          1.00000 
 4  0.89000         osd.4       up  1.00000          1.00000 
 6  0.89000         osd.6       up  0.86749          1.00000

这是 osd df:

ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR  
 8 0.89000  1.00000   916G  556G  359G 60.75 1.03 
 9 0.89000  1.00000   916G  564G  351G 61.61 1.05 
10 0.89000  1.00000   916G  514G  402G 56.12 0.95 
11 0.89000  1.00000   916G  510G  406G 55.68 0.95 
 2 0.89000  1.00000   916G  586G  329G 64.06 1.09 
 5 0.89000  1.00000   916G  456G  459G 49.85 0.85 
 7 0.89000  1.00000   915G  546G  368G 59.71 1.02 
 0 0.89000  0.79143   916G  615G  300G 67.16 1.14 
12 0.89000  1.00000   916G  472G  443G 51.61 0.88 
13 0.89000  1.00000   916G  628G  287G 68.60 1.17 
14 0.89000  1.00000   916G  540G  375G 59.01 1.00 
15 0.89000  1.00000   916G  596G  319G 65.15 1.11 
 1 0.89000  1.00000   916G  553G  362G 60.39 1.03 
 3 0.89000  1.00000   916G  462G  453G 50.53 0.86 
 4 0.89000  1.00000   916G  472G  443G 51.58 0.88 
 6 0.89000  0.86749   916G  540G  375G 58.99 1.00 
              TOTAL 14657G 8618G 6039G 58.80      
MIN/MAX VAR: 0.85/1.17  STDDEV: 5.67

以下是 QEMU KVM 的示例:

<domain type='kvm'>
  <name>testvm</name>
  <uuid>xxx</uuid>
  <memory unit='KiB'>12582912</memory>
  <currentMemory unit='KiB'>12582912</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>SandyBridge</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='pbe'/>
    <feature policy='require' name='tm2'/>
    <feature policy='require' name='est'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='osxsave'/>
    <feature policy='require' name='smx'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='ds'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='dtes64'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='dca'/>
    <feature policy='require' name='pcid'/>
    <feature policy='require' name='tm'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='ds_cpl'/>
    <feature policy='require' name='xtpr'/>
    <feature policy='require' name='acpi'/>
    <feature policy='require' name='monitor'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      <auth username='admin'>
        <secret type='ceph' uuid='xxx'/>
      </auth>
      <source protocol='rbd' name='vms/testvm'>
        <host name='mon1' port='6789'/>
        <host name='mon2' port='6789'/>
        <host name='mon3' port='6789'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='xxx'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <boot order='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
</domain>

答案1

这个问题已经很老了,但是如果其他人想知道性能问题,这里有一些需要注意的要点:

  • 不建议使用 1 GB 网络。我们从这个开始,收到了很多缓慢的请求。升级到 10 GBit 网络解决了一些性能问题。
  • 将 SSD 用于您的 OSD(日志)。
  • 使用蓝店。
  • 在使用虚拟机时,尝试为 RBD 池使用缓存层,我们从中受益匪浅。

相关内容