我们刚刚在 AWS 上发布了 Ubuntu 16.04 系统的新版本。除了 apt-get 更新等功能外,我们还在 puppet 代码中添加了一个明确的步骤,以使用 tune2fs 将 UUID 添加到 ext4 磁盘。(这是为了迁移到使用 nvme 设备名称的 amazon c5 实例类型做准备,我们想知道前后哪个磁盘是哪个。)
但随后我们需要重新启动大量系统以在 AWS(同一实例系列)中对其进行实例大小调整,其中约 10% 的系统因其数据驱动器(而非根驱动器)上的文件系统损坏而失败。
grep -i ext4 /var/log/kern.log |grep xvdh
2019-03-14T14:07:39.954930+00:00 ip-10-2-219-30 kernel: [ 26.059585]
EXT4-fs (xvdh): ext4_check_descriptors: Checksum for group 0 failed (25645!=13919)
2019-03-14T14:07:39.961718+00:00 ip-10-2-219-30 kernel: [ 26.064303] EXT4-fs (xvdh): group descriptors corrupted!
2019-03-14T14:07:54.984741+00:00 ip-10-2-219-30 kernel: [ 41.090302] EXT4-fs (xvdh): ext4_check_descriptors: Checksum for group 0 failed (25645!=13919)
2019-03-14T14:07:54.984757+00:00 ip-10-2-219-30 kernel: [ 41.094897] EXT4-fs (xvdh): group descriptors corrupted!
2019-03-14T14:08:17.138117+00:00 ip-10-2-219-30 kernel: [ 63.239655] EXT4-fs (xvdh): ext4_check_descriptors: Checksum for group 0 failed (25645!=13919)
2019-03-14T14:08:17.138141+00:00 ip-10-2-219-30 kernel: [ 63.246723] EXT4-fs (xvdh): group descriptors corrupted!
2019-03-14T14:21:30.636962+00:00 redacted1 kernel: [ 3.798075] EXT4-fs (xvdh): mounted filesystem with ordered data mode. Opts: (null)
2019-03-14T14:46:07.812220+00:00 redacted2 kernel: [ 3.614731] EXT4-fs (xvdh): mounted filesystem with ordered data mode. Opts: (null)
然后我们必须对驱动器进行 fsck 以恢复系统。
进行此更改的 puppet 代码如下。到目前为止,我们仅使用 M4/C4 实例类型,因此应该全部是 /dev/xvdh。
class our_storage::platforms::aws {
# This shouldn't run during image generation.
if $::packer_build != 'yes' {
# If nvme0n1 is present this means we are using a M5 or C5 instance and then the data volume will be nvme1n1
# We need to check the disk that are mounted in / because it might take time for the data volume to appear as totally mounted to the instance.
# xvda --> xvdh
# nvme0n1 --> nvme1n1
if $facts['disks']['nvme0n1'] {
$st_volume = '/dev/nvme1n1'
}
elsif $facts['disks']['xvda'] {
$st_volume = '/dev/xvdh'
}
else {
fail("Invalid disk configuration ${facts['disks']}")
}
$fstype = 'ext4'
$mount_opts = 'auto,noatime'
# If /data is not mounted, go ahead and do it.
if !$facts['mountpoints']['/data'] {
# Get an unique, constant UUID for this volume.
$ec2_userdata = parsejson($facts['ec2_userdata'])
$domain = $ec2_userdata['domain']
$subdomain = $ec2_userdata['subDomain']
$st_volume_uuid = fqdn_uuid("${subdomain}.${domain}")
# we may have to wait for the device to "appear"
exec { 'Storage: waiting for data volume to be attached':
path => '/bin',
command => "lsblk -fn ${st_volume}",
tries => 60,
try_sleep => 10,
unless => 'mountpoint -q -- "/data"',
logoutput => true,
} -> exec { 'Storage: formatting data volume': # WARNING: if we ever change from ext4, this will reformat volumes!
path => ['/sbin', '/bin'],
command => "mkfs.${fstype} -F ${st_volume}",
unless => "blkid ${st_volume} | grep -q 'TYPE=\"${fstype}\"'",
logoutput => true,
} -> exec { 'Storage: assign UUID to data volume':
path => ['/sbin', '/bin'],
command => "tune2fs ${st_volume} -U ${st_volume_uuid}",
logoutput => true,
} ~> mount { '/data':
ensure => mounted,
device => "UUID=${st_volume_uuid}",
fstype => $fstype,
options => $mount_opts,
require => File['/data'],
before => File[$our_storage::data_dirs],
}
} else {
# Need to fetch the current UUID.
# Cannot be changed if the volume is already mounted!
$st_volume_uuid = $st_volume ? {
'/dev/nvme1n1' => get_disk_uuid('/dev/nvme1n1'),
'/dev/xvdh' => get_disk_uuid('/dev/xvdh')
}
# If data is already mounted, just make sure that everything in fstab is in place.
# e.g. it is using the UUID as disk identifier.
mount { '/data':
ensure => mounted,
device => "UUID=${st_volume_uuid}",
fstype => $fstype,
options => $mount_opts,
require => File['/data'],
before => File[$our_storage::data_dirs],
}
}
}
}
我们还不能弄清楚这个变化是否是罪魁祸首——它似乎是唯一相关的重大变化,但我们看不出它会如何破坏事物......我们可以关联的一件事是,这似乎有些偏向于繁忙的系统,其中我们安装的 EBS 驱动器合理地突发平衡耗尽,因此可能会很慢。
我们尝试在一系列开发系统上重现此问题,但未能引发相同的故障。
我知道我们可以自动化 fsck,但这有点像是在掩盖最初造成损害的东西;如果它造成的损害超过了 fsck 无人值守修复的范围,会发生什么?我们经营着一支庞大的舰队。
是否有任何已知的方法可以在缓慢或仍在安装的系统上执行 tune2fs 来损坏 ext4 文件系统,或者我们正在做的其他明显的事情是否会导致这种损坏?我们可以做些什么来确定它是否是?因为这是间歇性的不可重现的,并且还有其他更改(软件包更新和所有内容),我们不能确定 UUID 的添加是原因,但从时间上看肯定是可疑的。