Docker overlay2 文件系统损坏

Docker overlay2 文件系统损坏

我有一个小型家庭服务器,用于托管一些我喜欢的服务。最近,该服务器在主服务器上进入只读状态逻辑卷管理器体积,这是体积Ubuntu已安装。我几乎没有调查此类问题的经验。

问题

每隔几周,我就会尝试访问服务器上托管的一项服务,但发现它们没有响应。根据经验,我知道这意味着卷/dev/mapper/ubuntu--vg-ubuntu--lv已变为只读。

我的解决办法

我可以运行fsck /dev/mapper/ubuntu--vg-ubuntu--lv,最终报告并修复很多错误/var/lib/docker/overlay2/

该命令的缩短输出:

fsck from util-linux 2.31.1
e2fsck 1.44.1 (24-Mar-2018)
/dev/mapper/ubuntu--vg-ubuntu--lv: recovering journal
JBD2: Invalid checksum recovering block 6294682 in log
JBD2: Invalid checksum recovering block 5767200 in log
JBD2: Invalid checksum recovering block 5769164 in log
...
Journal checksum error found in /dev/mapper/ubuntu--vg-ubuntu--lv
/dev/mapper/ubuntu--vg-ubuntu--lv contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found.  Fix<y>? yes
Inode 395236 was part of the orphaned inode list.  FIXED.
...
Pass 2: Checking directory structure
Symlink /var/lib/docker/overlay2/e2983dd0abaadcf1745ffa84433e9c84f8b1a4ab79ee4b9b059f039758b885d1/diff/opt/gitlab/embedded/service/gitlab-shell/.gitlab_shell_secret (inode #1613356) is invalid.
Clear<y>? yes
...
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(7406080--7406211) -(7407552--7407614) -(7408736--7408765) -(7411072--7411125) -(7413760--7413824) -(7414784--7415295)
Fix? yes
/dev/mapper/ubuntu--vg-ubuntu--lv: ***** FILE SYSTEM WAS MODIFIED *****
/dev/mapper/ubuntu--vg-ubuntu--lv: ***** REBOOT SYSTEM *****
/dev/mapper/ubuntu--vg-ubuntu--lv: 453053/2097152 files (0.1% non-contiguous), 3134566/8388608 blocks

sudo docker image rm gitlab/gitlab-ee -f我会对所有受影响的图像进行跟踪。

我显然不能继续这样做,因为我需要删除所有损坏的文件并重新部署所有 docker 镜像。

问题

我该如何彻底解决这个问题?我怎么知道该去哪里找?

更新 1

dmesg损坏发生后一分钟左右的输出。请注意,这只是一小部分,dmesg返回很多这些,但它们都是相同的格式只读文件系统格式。

[250377.826952] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250377.827922] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250378.236228] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250378.237456] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250381.765351] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250381.766273] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250381.982447] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250381.983362] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250382.639171] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250382.640142] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250382.766604] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system
[250382.767619] systemd-journald[445]: Failed to write entry (21 items, 858 bytes), ignoring: Read-only file system

我认为我此时开始跑步已经太晚了dmesg。我怀疑我是否能更快地注意到这个问题。

更新 2

我终于及时发现了错误。

[94813.533642] IPv6: ADDRCONF(NETDEV_UP): veth13a279e: link is not ready
[94813.533654] br-069ee8939c9c: port 1(veth13a279e) entered blocking state
[94813.533658] br-069ee8939c9c: port 1(veth13a279e) entered forwarding state
[94813.534526] br-069ee8939c9c: port 1(veth13a279e) entered disabled state
[94813.603964] br-54483b957e18: port 3(vetha0f7425) entered blocking state
[94813.603967] br-54483b957e18: port 3(vetha0f7425) entered disabled state
[94813.604076] device vetha0f7425 entered promiscuous mode
[94813.604292] IPv6: ADDRCONF(NETDEV_UP): vetha0f7425: link is not ready
[94813.604298] br-54483b957e18: port 3(vetha0f7425) entered blocking state
[94813.604299] br-54483b957e18: port 3(vetha0f7425) entered forwarding state
[94814.483126] br-54483b957e18: port 3(vetha0f7425) entered disabled state
[94814.816681] eth0: renamed from vethfda95fe
[94814.831541] IPv6: ADDRCONF(NETDEV_CHANGE): veth13a279e: link becomes ready
[94814.831591] br-069ee8939c9c: port 1(veth13a279e) entered blocking state
[94814.831593] br-069ee8939c9c: port 1(veth13a279e) entered forwarding state
[94815.842053] eth0: renamed from vethba9b3b5
[94815.859813] IPv6: ADDRCONF(NETDEV_CHANGE): vetha0f7425: link becomes ready
[94815.859882] br-54483b957e18: port 3(vetha0f7425) entered blocking state
[94815.859886] br-54483b957e18: port 3(vetha0f7425) entered forwarding state
[94816.829692] eth0: renamed from veth8fea459
[94816.847459] IPv6: ADDRCONF(NETDEV_CHANGE): vetha6ab738: link becomes ready
[94816.847491] br-54483b957e18: port 1(vetha6ab738) entered blocking state
[94816.847495] br-54483b957e18: port 1(vetha6ab738) entered forwarding state
[95248.835077] r8169 0000:13:00.0 enp19s0: link down
[95257.640648] r8169 0000:13:00.0 enp19s0: link up
[96145.500660] sd 0:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[96145.500670] sd 0:0:0:0: [sda] tag#6 CDB: Write(10) 2a 00 00 a4 7d 40 00 00 30 00
[96145.500674] print_req_error: I/O error, dev sda, sector 10779968
[96145.500826] Aborting journal on device dm-0-8.
[96145.500828] sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[96145.500834] sd 0:0:0:0: [sda] tag#7 CDB: Write(10) 2a 00 07 64 41 a8 00 00 08 00
[96145.500895] print_req_error: I/O error, dev sda, sector 124010920
[96145.500932] EXT4-fs error (device dm-0) in __ext4_new_inode:1082: Journal has aborted
[96145.500947] EXT4-fs warning (device dm-1): ext4_end_bio:323: I/O error 10 writing to inode 1709169 (offset 0 size 4096 starting block 6849845)
[96145.500982] Buffer I/O error on device dm-1, logical block 6849845
[96145.501023] sd 0:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[96145.501025] sd 0:0:0:0: [sda] tag#10 CDB: Write(10) 2a 00 05 d0 58 88 00 00 08 00
[96145.501027] print_req_error: I/O error, dev sda, sector 97540232
[96145.501058] EXT4-fs warning (device dm-0): ext4_end_bio:323: I/O error 10 writing to inode 450308 (offset 0 size 0 starting block 7735313)
[96145.501060] Buffer I/O error on device dm-0, logical block 7735313
[96145.501105] sd 0:0:0:0: [sda] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[96145.501110] sd 0:0:0:0: [sda] tag#9 CDB: Write(10) 2a 00 0a 24 94 80 00 00 08 00
[96145.501114] print_req_error: I/O error, dev sda, sector 170169472
[96145.501147] EXT4-fs warning (device dm-1): ext4_end_bio:323: I/O error 10 writing to inode 3148652 (offset 0 size 4096 starting block 12619664)
96145.501148] Buffer I/O error on device dm-1, logical block 12619664
[96145.501187] sd 0:0:0:0: [sda] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[96145.501189] sd 0:0:0:0: [sda] tag#8 CDB: Write(10) 2a 00 03 27 16 40 00 00 80 00
[96145.501190] print_req_error: I/O error, dev sda, sector 52893248
[96145.501266] sd 0:0:0:0: [sda] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[96145.501268] sd 0:0:0:0: [sda] tag#11 CDB: Write(10) 2a 00 07 64 58 e0 00 00 08 00
[96145.501269] print_req_error: I/O error, dev sda, sector 124016864
[96145.501301] EXT4-fs warning (device dm-1): ext4_end_bio:323: I/O error 10 writing to inode 1706133 (offset 0 size 4096 starting bl
ock 6850588)
[96145.501302] Buffer I/O error on device dm-1, logical block 6850588
[96145.501382] EXT4-fs (dm-0): Delayed block allocation failed for inode 450308 at logical offset 18450 with max blocks 7 with error 
30
[96145.501438] EXT4-fs (dm-0): This should not happen!! Data will be lost
[96145.501477] EXT4-fs error (device dm-0) in ext4_writepages:2895: Journal has aborted
[96145.504583] Aborting journal on device dm-1-8.
[96145.507210] EXT4-fs error (device dm-0): ext4_journal_check_start:61: Detected aborted journal
[96145.507306] EXT4-fs (dm-0): Remounting filesystem read-only
[96145.513169] EXT4-fs error (device dm-0) in ext4_reserve_inode_write:5769: Journal has aborted
[96145.514680] EXT4-fs error (device dm-0) in ext4_mkdir:2660: IO failure
[96145.516071] EXT4-fs error (device dm-1): ext4_journal_check_start:61: Detected aborted journal
[96145.516095] EXT4-fs error (device dm-1): ext4_journal_check_start:61: Detected aborted journal
[96145.516121] EXT4-fs (dm-1): Remounting filesystem read-only
[96145.518645] EXT4-fs error (device dm-1): ext4_journal_check_start:61: Detected aborted journal
[96145.520899] EXT4-fs error (device dm-1): ext4_journal_check_start:61: Detected aborted journal
[96145.531523] EXT4-fs error (device dm-1): ext4_journal_check_start:61: Detected aborted journal
[96145.542876] EXT4-fs error (device dm-1): ext4_journal_check_start:61: Detected aborted journal
[96145.542955] EXT4-fs error (device dm-1): ext4_journal_check_start:61: Detected aborted journal
[96145.944229] traps: postgres[2215] general protection ip:7fba363d97a1 sp:7ffc943cc880 error:0 in ld-musl-x86_64.so.1[7fba363d2000+4
6000]
[96146.339588] br-ae1db04d3386: port 4(vethb6cf243) entered disabled state
[96146.345493] veth3fdc38e: renamed from eth0
[96146.441684] overlayfs: upper fs is r/o, try multi-lower layers mount
[96149.670401] br-ae1db04d3386: port 7(veth2bcd3f7) entered disabled state
[96149.670617] vethf7402f2: renamed from eth0
[96149.776068] overlayfs: upper fs is r/o, try multi-lower layers mount
[96180.516686] systemd-journald[392]: Failed to write entry (21 items, 603 bytes), ignoring: Read-only file system
[96180.516842] systemd-journald[392]: Failed to write entry (21 items, 694 bytes), ignoring: Read-only file system
[96180.516985] systemd-journald[392]: Failed to write entry (21 items, 603 bytes), ignoring: Read-only file system
[96180.517093] systemd-journald[392]: Failed to write entry (21 items, 694 bytes), ignoring: Read-only file system
[96180.517192] systemd-journald[392]: Failed to write entry (21 items, 603 bytes), ignoring: Read-only file system
[96180.517558] systemd-journald[392]: Failed to write entry (21 items, 694 bytes), ignoring: Read-only file system
[96180.517753] systemd-journald[392]: Failed to write entry (21 items, 603 bytes), ignoring: Read-only file system
[96180.518982] systemd-journald[392]: Failed to write entry (21 items, 694 bytes), ignoring: Read-only file system
[96180.520186] systemd-journald[392]: Failed to write entry (21 items, 603 bytes), ignoring: Read-only file system
[96180.521475] systemd-journald[392]: Failed to write entry (21 items, 694 bytes), ignoring: Read-only file system

更新 3

我已将驱动器连接到 Windows 主机并检查固件更新。驱动器具有最新固件。

SMART 还报告了一些在我看来非常合理的数字。

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   000   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       5638
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       75
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       3
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       0
169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       26
170 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      -       18
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       2424903
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       1
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       52
194 Temperature_Celsius     0x0022   077   068   000    Old_age   Always       -       23 (Min/Max 22/32)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
231 Temperature_Celsius     0x0000   004   004   000    Old_age   Offline      -       96
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       3775
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       2489
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       976
244 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       37
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       71
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       141104

我将安装一个新的操作系统并从备份中恢复数据,看看是否可以解决问题。

Docker 信息

由于只有 docker 覆盖目录似乎损坏,因此这里是 docker 存储配置:

Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true

相关内容