我有三台服务器,全部运行 KVM,并有 4-5 个 Ubuntu Server 客户机:
- 通用 PC,单个 SATA 硬盘,Ubuntu 14.04
- PowerEdge 2950 8GB RAM,2x SAS 10k(RAID1),Ubuntu 12.04
- PowerEdge 2950 8GB RAM,2x SAS 10k(RAID1),Ubuntu 14.04
服务器 #1 没有问题。但是服务器 #2 和 #3 偶尔会有访客进入“只读文件系统”。除了一种情况外,其他情况下访客都可以恢复,但我无法找出原因。
两个 PE 都配有 PERC 6/i,固件都是最新的,今年的全新驱动器。#2 自 1 月以来运行良好,#3 是几个月后添加的,但直到最近才开始少量使用。问题大约在一个月前开始出现。
它们没有运行戴尔驱动器。#2 有 2xWestern Digital WD3001BKHG-02D22,而#3 有 2xHitatchi HUC106060CSS600。
两个系统的智能状态都很清晰。戴尔诊断程序也很清晰。
客户机在 virtio 上运行,RAW 磁盘格式。它们负载不重。几个 dns 服务器,几个轻量级 web 服务器,cacti 等。
我已将所有重要客户机移至 #1,除两个测试客户机外,#3 为空。我让测试客户机整夜运行“stress -d 2”,试图引发只读问题,但什么也没发生。
我还运行了磁盘性能测试:
Host:
Timing cached reads: 12190 MB in 2.00 seconds = 6100.67 MB/sec
Timing buffered disk reads: 480 MB in 3.01 seconds = 159.43 MB/sec
time dd if=/dev/zero of=./file.out bs=1M count=10k
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 72.2636 s, 149 MB/s
real 1m12.280s
user 0m0.019s
sys 0m13.268s
Guest:
Timing cached reads: 12434 MB in 2.00 seconds = 6222.50 MB/sec
Timing buffered disk reads: 358 MB in 3.01 seconds = 118.90 MB/sec
time dd if=/dev/zero of=./file.out bs=1M count=10k
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 70.4251 s, 152 MB/s
real 1m10.804s
user 0m0.008s
sys 0m14.792s
我在受影响的客人或主人的日志中都找不到任何表明原因的内容,我不知所措。
编辑:
以下是一些 OMSA 日志示例:
2243 Wed Sep 24 00:23:45 2014 Storage Service The Patrol Read has stopped.: Controller 0 (PERC 6/i Integrated)
2242 Tue Sep 23 20:08:56 2014 Storage Service The Patrol Read has started.: Controller 0 (PERC 6/i Integrated)
2334 Tue Sep 23 15:54:55 2014 Storage Service Controller event log: Unexpected sense: PD 01(e0x20/s1) Path 5000cca03c5880d1, CDB: 12 01 dc 01 1d 00, Sense: 5/24/00: Controller 0 (PERC 6/i Integrated)
2334 Tue Sep 23 15:54:55 2014 Storage Service Controller event log: Unexpected sense: Encl PD 20 Path 50022090b0c9d900, CDB: 12 00 00 00 04 00, Sense: 5/24/00: Controller 0 (PERC 6/i Integrated)
2334 Tue Sep 23 15:54:55 2014 Storage Service Controller event log: Unexpected sense: Encl PD 20 Path 50022090b0c9d900, CDB: 12 00 00 00 04 00, Sense: 5/24/00: Controller 0 (PERC 6/i Integrated)
2334 Tue Sep 23 15:54:54 2014 Storage Service Controller event log: Battery temperature is normal: Controller 0 (PERC 6/i Integrated)
2334 Tue Sep 23 15:54:54 2014 Storage Service Controller event log: Current capacity of the battery is above threshold: Controller 0 (PERC 6/i Integrated)
2334 Tue Sep 23 15:54:54 2014 Storage Service Controller event log: Battery charge complete: Controller 0 (PERC 6/i Integrated)