Ubuntu 20.04,磁盘格式错误,如何解决?
# mkfs -t ext4 /dev/nvme6n2
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 3750232064 4k blocks and 468779008 inodes
Filesystem UUID: 9dc6c2f9-2297-4e44-a61d-481617e40158
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
2560000000
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
来自 dmesg:
[31366539.313763] print_req_error: 9 callbacks suppressed
[31366539.313766] blk_update_request: I/O error, dev nvme6c10n2, sector 1842432 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.314934] buffer_io_error: 596 callbacks suppressed
[31366539.314938] Buffer I/O error on dev nvme6n2, logical block 230304, lost async page write
[31366539.315492] Buffer I/O error on dev nvme6n2, logical block 230305, lost async page write
[31366539.316038] Buffer I/O error on dev nvme6n2, logical block 230306, lost async page write
[31366539.316566] Buffer I/O error on dev nvme6n2, logical block 230307, lost async page write
[31366539.317080] Buffer I/O error on dev nvme6n2, logical block 230308, lost async page write
[31366539.317582] Buffer I/O error on dev nvme6n2, logical block 230309, lost async page write
[31366539.318073] Buffer I/O error on dev nvme6n2, logical block 230310, lost async page write
[31366539.318554] Buffer I/O error on dev nvme6n2, logical block 230311, lost async page write
[31366539.319021] Buffer I/O error on dev nvme6n2, logical block 230312, lost async page write
[31366539.319482] Buffer I/O error on dev nvme6n2, logical block 230313, lost async page write
[31366539.319993] blk_update_request: I/O error, dev nvme6c10n2, sector 12854784 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.320929] blk_update_request: I/O error, dev nvme6c10n2, sector 163847936 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.321838] blk_update_request: I/O error, dev nvme6c10n2, sector 89918976 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.322782] blk_update_request: I/O error, dev nvme6c10n2, sector 63701504 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.323677] blk_update_request: I/O error, dev nvme6c10n2, sector 63704320 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.324621] blk_update_request: I/O error, dev nvme6c10n2, sector 163847680 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.329814] blk_update_request: I/O error, dev nvme6c10n2, sector 191112448 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.330868] blk_update_request: I/O error, dev nvme6c10n2, sector 191107584 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
[31366539.348081] blk_update_request: I/O error, dev nvme6c10n2, sector 1719938048 op 0x1:(WRITE) flags 0x4004800 phys_seg 32 prio class 0
但磁盘是好的:
smartctl -a /dev/nvme6n2
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-91-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SAMSUNG MZWLR15THALA-00007
Serial Number: S6EXNE0R800894
Firmware Version: MPK90B5Q
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 15,360,950,534,144 [15.3 TB]
Unallocated NVM Capacity: 0
Controller ID: 65
Number of Namespaces: 32
Namespace 1 Size/Capacity: 15,360,950,534,144 [15.3 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon Dec 19 18:06:52 2022 GMT
Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x00df): Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec Vrt_Mngmt
Optional NVM Commands (0x007f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Resv Timestmp
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 71 Celsius
Critical Comp. Temp. Threshold: 84 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 25.00W 22.00W - 0 0 0 0 180 180
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 512 0 1
1 - 512 8 3
2 - 4096 0 0
3 - 4096 8 2
4 - 4096 64 3
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 32 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 388,363,814 [198 TB]
Data Units Written: 1,230,470,407 [630 TB]
Host Read Commands: 2,891,609,287
Host Write Commands: 6,972,724,748
Controller Busy Time: 2,098
Power Cycles: 1
Power On Hours: 7,181
Unsafe Shutdowns: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 32 Celsius
Temperature Sensor 2: 32 Celsius
Temperature Sensor 3: 31 Celsius
Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged
我发现我确实觉得奇怪的一件事是 dev/ 中这些驱动器的设备枚举丢失了。因此 nvme6n2 是磁盘 nvme6 上的一个分区(分区 2),但是 nvme6 并未在 dev/ 中列为设备
# ls /dev/nvme[6,7,8]*
/dev/nvme6n2 /dev/nvme7n2 /dev/nvme8n2
工作磁盘的示例,您可以看到设备 nvme5:
# ls /dev/nvme[5]*
/dev/nvme5 /dev/nvme5n1 /dev/nvme5n1p1 /dev/nvme5n1p2