我创建了一个 AWS EC2 实例g5.2xlage
类型深度学习基础 GPU AMI(Ubuntu 20.04)。此实例类型具有实例存储卷。首次启动后,临时实例存储已安装,但每 2 或 3 次重新启动(或停止和启动)后,该卷就不会安装。
好案例
如果卷已安装,我可以使用以下命令查看它lsblk
:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 33.3M 1 loop /snap/amazon-ssm-agent/3552
loop1 7:1 0 24.9M 1 loop /snap/amazon-ssm-agent/7628
loop2 7:2 0 55.4M 1 loop /snap/core18/2066
loop3 7:3 0 91.9M 1 loop /snap/lxd/24061
loop4 7:4 0 55.7M 1 loop /snap/core18/2812
loop5 7:5 0 40.4M 1 loop /snap/snapd/20671
loop6 7:6 0 63.9M 1 loop /snap/core20/2105
loop7 7:7 0 67.6M 1 loop /snap/lxd/20326
nvme0n1 259:0 0 80G 0 disk
└─nvme0n1p1 259:3 0 80G 0 part /
nvme2n1 259:1 0 419.1G 0 disk
└─vg.01-lv_ephemeral 253:0 0 419.1G 0 lvm /opt/dlami/nvme
nvme1n1 259:2 0 75G 0 disk /mnt/data
我可以在中看到syslog
脚本/opt/aws/dlami/bin/nvme_ephemeral_drives.sh
被执行了:
nvme_ephemeral_drives.sh[558]: This instance type has (1) device(s) for instance store: (/dev/nvme2n1)
nvme_ephemeral_drives.sh[558]: LVM (/dev/vg.01/lv_ephemeral) does not exist
nvme_ephemeral_drives.sh[558]: Creating LVM (/dev/vg.01/lv_ephemeral)
nvme_ephemeral_drives.sh[878]: Physical volume "/dev/nvme2n1" successfully created.
nvme_ephemeral_drives.sh[889]: Volume group "vg.01" successfully created
nvme_ephemeral_drives.sh[925]: Ignoring stripesize argument with single stripe.
nvme_ephemeral_drives.sh[925]: Logical volume "lv_ephemeral" created.
nvme_ephemeral_drives.sh[558]: LVM (/dev/vg.01/lv_ephemeral) created successfully
nvme_ephemeral_drives.sh[558]: Found LVM (/dev/vg.01/lv_ephemeral) in state (a)
nvme_ephemeral_drives.sh[558]: Found LVM (/dev/vg.01/lv_ephemeral) FS type ()
nvme_ephemeral_drives.sh[558]: Formatting LVM (/dev/vg.01/lv_ephemeral) with FS type (ext4)
nvme_ephemeral_drives.sh[1333]: mke2fs 1.45.5 (07-Jan-2020)
nvme_ephemeral_drives.sh[1333]: Discarding device blocks: 4096/109862912#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010 #010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010done
nvme_ephemeral_drives.sh[1333]: Creating filesystem with 109862912 4k blocks and 27467776 inodes
nvme_ephemeral_drives.sh[1333]: Filesystem UUID: 44fddd50-9a7b-4eec-8e68-2fd5def18526
nvme_ephemeral_drives.sh[1333]: Superblock backups stored on blocks:
nvme_ephemeral_drives.sh[1333]: #01132768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
nvme_ephemeral_drives.sh[1333]: #0114096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
nvme_ephemeral_drives.sh[1333]: #011102400000
nvme_ephemeral_drives.sh[1333]: Allocating group tables: 0/3353#010#010#010#010#010#010#010#010#010 #010#010#010#010#010#010#010#010#010done
nvme_ephemeral_drives.sh[1333]: Writing inode tables: 0/3353#010#010#010#010#010#010#010#010#010 #010#010#010#010#010#010#010#010#010done
nvme_ephemeral_drives.sh[1333]: Creating journal (262144 blocks): done
nvme_ephemeral_drives.sh[1333]: Writing superblocks and filesystem accounting information: 0/3353#010#010#010#010#010#010#010#010#010 #010#010#010#010#010#010#010#010#010done
nvme_ephemeral_drives.sh[558]: LVM (/dev/vg.01/lv_ephemeral) formatted successfully
nvme_ephemeral_drives.sh[558]: LVM (/dev/vg.01/lv_ephemeral) not mounted, mounting on (/opt/dlami/nvme)
nvme_ephemeral_drives.sh[1529]: mount: /dev/mapper/vg.01-lv_ephemeral mounted on /opt/dlami/nvme.
nvme_ephemeral_drives.sh[558]: LVM (/dev/vg.01/lv_ephemeral) mounted successfully
失败案例
如果卷未安装,我可以使用以下命令查看它lsblk
:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 33.3M 1 loop /snap/amazon-ssm-agent/3552
loop1 7:1 0 24.9M 1 loop /snap/amazon-ssm-agent/7628
loop2 7:2 0 55.4M 1 loop /snap/core18/2066
loop3 7:3 0 91.9M 1 loop /snap/lxd/24061
loop4 7:4 0 55.7M 1 loop /snap/core18/2812
loop5 7:5 0 40.4M 1 loop /snap/snapd/20671
loop6 7:6 0 63.9M 1 loop /snap/core20/2105
loop7 7:7 0 67.6M 1 loop /snap/lxd/20326
nvme0n1 259:0 0 80G 0 disk
└─nvme0n1p1 259:3 0 80G 0 part /
nvme2n1 259:1 0 419.1G 0 disk
nvme1n1 259:2 0 75G 0 disk /mnt/data
研究
如果我
/opt/aws/dlami/bin/nvme_ephemeral_drives.sh
手动在故障实例上运行该脚本,则会创建并挂载该分区。/opt/aws/dlami/bin/nvme_ephemeral_drives.sh
如果我手动在良好的实例上运行脚本,则不会发生任何改变。This instance type has (1) device(s) for instance store: (/dev/nvme2n1) LVM (/dev/vg.01/lv_ephemeral) already exists Found LVM (/dev/vg.01/lv_ephemeral) in state (a) Found LVM (/dev/vg.01/lv_ephemeral) FS type (ext4) LVM (/dev/vg.01/lv_ephemeral) already formatted with FS type (ext4) LVM (/dev/vg.01/lv_ephemeral) already mounted on (/opt/dlami/nvme)
问题
为什么我的临时实例存储每次都无法安装?