我的生产服务器有坏块,现在怎么办?

我的生产服务器有坏块,现在怎么办?

我的服务器几个月内已经崩溃了 N 次,所以我决定做一个badblocks测试。我使用fsck来检测和标记badblocks,它确实检测到了一些。如果我没记错的话,这意味着文件系统将不再使用这些块来存储数据。

但是,已经存在的数据会怎样?它被移动了吗?它可能一开始就被损坏了,所以使用这些块的文件可能被损坏了。现在我有几个悬而未决的问题:

  1. 我能检测出哪些文件受到了影响吗?
  2. 我怎样才能检查这些文件是否已损坏fsck
  3. 有什么方法可以告诉我的发行版(Ubuntu 14.04)“重新安装所有软件包,因为它们已缓存在系统中”?(也就是说,不升级,只重新安装当前版本,而不覆盖任何配置文件)

注意:为了完整性,我在这里粘贴了以下结果fsck

root@rescue:~# fsck -vcck /dev/sda2
fsck from util-linux 2.20.1
e2fsck 1.42.5 (29-Jul-2012)
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done                                                 
/dev/sda2: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes

Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 8: 119060233 119060234 119060592 119060615 119060616 119060617 119060618 119060619 119060620 119060621 119060623 119060624 119060625 119060626 119060632 119060633 119060635 119060636 119060637 119060638 119060639 119061755
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 0 inodes containing multiply-claimed blocks.)

File <The journal inode> (inode #8, mod time Mon May  5 14:17:18 2014) 
  has 22 multiply-claimed block(s), shared with 1 file(s):
        <The bad blocks inode> (inode #1, mod time Thu Aug  7 19:11:37 2014)
Clone multiply-claimed blocks<y>? yes
Error reading block 119060233 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060234 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060592 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060615 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060616 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060617 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060618 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060619 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060620 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060621 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060623 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060624 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060625 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060626 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060632 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060633 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060635 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060636 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060637 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060638 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119060639 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Error reading block 119061755 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (23499, counted=23477).
Fix<y>? yes
Free blocks count wrong for group #2016 (23956, counted=23961).
Fix<y>? yes
Free blocks count wrong for group #3633 (65514, counted=0).
Fix<y>? yes
Free blocks count wrong (231534163, counted=231534168).
Fix<y>? yes

/dev/sda2: ***** FILE SYSTEM WAS MODIFIED *****

      154609 inodes used (0.26%, out of 59736064)
          47 non-contiguous files (0.0%)
           9 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 154209/10
     7404456 blocks used (3.10%, out of 238938624)
          99 bad blocks
           2 large files

      126167 regular files
       27996 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
         437 symbolic links (382 fast symbolic links)
           0 sockets
------------
      154600 files

答案1

首先,看一下 smartmontools 的 Bad Block HOWTO:

https://www.smartmontools.org/wiki/BadBlockHowto

其次,如果您还没有,那么现在该实施一个有效的备份策略了。

如果您需要服务器具有一定的可用性,您可能还需要考虑实施 RAID-1、镜像。

不管怎样,现在是时候扔掉旧硬盘,换一个新的了。它在过去已经让你失望过很多次了,所以可以肯定的是,在不久的将来和更长的将来,这种情况不会好转。

相关内容