由于硬盘消耗过多,GCP 上的 Debian 服务器突然无法通过 SSH 访问

由于硬盘消耗过多,GCP 上的 Debian 服务器突然无法通过 SSH 访问

我为此苦苦挣扎了一个多月,并尝试了论坛上发布的许多其他解决方案。

由于备份脚本失败,Compute Engine VM 实例磁盘消耗过多,此后 SSH 身份验证不起作用。

我已经尝试过以下操作:A.从 GCP 控制台增加磁盘大小。

B. 拍摄磁盘快照并从中创建新实例

C. 已创建带有新磁盘的新实例,并将快照作为附加磁盘附加(当我删除附加的快照时,此实例可以正常工作,但如果我附加有问题的磁盘,它就无法正常工作)

我尝试过我遇到的各种启动脚本

尝试了这个解决方案 sudo resize2fs /dev/sda1

尝试了此处列出的解决方案https://github.com/mbrukman/stackexchange-answers/blob/master/stackoverflow/24021214/fdisk.sh

当我查看串行端口日志的错误时,我收到以下失败的命令

[[0;1;31mFAILED[0m] Failed to mount Huge Pages File System.
See 'systemctl status dev-hugepages.mount' for details.
[[0;1;31mFAILED[0m] Failed to mount POSIX Message Queue File System.
See 'systemctl status dev-mqueue.mount' for details.
[[0;32m  OK  [0m] Started Load Kernel Modules.
[[0;1;31mFAILED[0m] Failed to mount Debug File System.
See 'systemctl status sys-kernel-debug.mount' for details.
[[0;1;31mFAILED[0m] Failed to start Create list of requ…vice nodes for the current kernel.
See 'systemctl status kmod-static-nodes.service' for details.
[[0;1;31mFAILED[0m] Failed to start Remount Root and Kernel File Systems.
See 'systemctl status systemd-remount-fs.service' for details.
[[0;32m  OK  [0m] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
         Starting udev Coldplug all Devices...
         Starting Load/Save Random Seed...
         Starting Create Static Device Nodes in /dev...
         Starting Apply Kernel Variables...
[[0;1;31mFAILED[0m] Failed to start Flush Journal to Persistent Storage.
See 'systemctl status systemd-journal-flush.service' for details.
[[0;1;31mFAILED[0m] Failed to start udev Coldplug all Devices.
See 'systemctl status systemd-udev-trigger.service' for details.
[[0;32m  OK  [0m] Started Load/Save Random Seed.
[[0;1;31mFAILED[0m] Failed to start Create Static Device Nodes in /dev.
See 'systemctl status systemd-tmpfiles-setup-dev.service' for details.
[[0;32m  OK  [0m] Started Apply Kernel Variables.
         Starting udev Kernel Device Manager...
[[0;32m  OK  [0m] Reached target Local File Systems (Pre).
[[0;32m  OK  [0m] Reached target Local File Systems.
         Starting Raise network interfaces...
         Starting Create Volatile Files and Directories...
[[0;1;31mFAILED[0m] Failed to start Create Volatile Files and Directories.
See 'systemctl status systemd-tmpfiles-setup.service' for details.
[[0;1;31mFAILED[0m] Failed to start Entropy daemon using the HAVEGE algorithm.
See 'systemctl status haveged.service' for details.
[[0;32m  OK  [0m] Reached target System Time Synchronized.
         Starting Update UTMP about System Boot/Shutdown...
[[0;1;31mFAILED[0m] Failed to start Raise network interfaces.
See 'systemctl status networking.service' for details.
[[0;32m  OK  [0m] Started udev Kernel Device Manager.
[[0;32m  OK  [0m] Reached target Network.
[[0;32m  OK  [0m] Reached target Network is Online.
[[0;32m  OK  [0m] Started Update UTMP about System Boot/Shutdown.
[[0;32m  OK  [0m] Reached target System Initialization.
[[0;32m  OK  [0m] Listening on UUID daemon activation socket.
[[0;32m  OK  [0m] Started Daily apt download activities.
[[0;32m  OK  [0m] Started Daily apt upgrade and clean activities.
[[0;32m  OK  [0m] Listening on ACPID Listen Socket.
[[0;32m  OK  [0m] Reached target Sockets.
[[0;32m  OK  [0m] Started ACPI Events Check.
[[0;32m  OK  [0m] Reached target Paths.
[[0;32m  OK  [0m] Reached target Basic System.
         Starting getty on tty2-tty6 if dbus and logind are not available...
[[0;32m  OK  [0m] Started Regular background program processing daemon.
[[0;32m  OK  [0m] Started ACPI event daemon.
[[0;32m  OK  [0m] Started Unattended Upgrades Shutdown.
         Starting LSB: bitnami init script...
         Starting System Logging Service...
[[0;32m  OK  [0m] Started Deferred execution scheduler.
         Starting LSB: Start NTP daemon...
         Starting Expand the root partition and filesystem on boot...
         Starting Permit User Sessions...
[[0;32m  OK  [0m] Started Daily Cleanup of Temporary Directories.
[[0;32m  OK  [0m] Reached target Timers.
         Starting LSB: start and stop Stackdriver Agent...
[[0;1;31mFAILED[0m] Failed to start getty on tty2-tty6 …dbus and logind are not available.
See 'systemctl status getty-static.service' for details.
[[0;1;31mFAILED[0m] Failed to start LSB: bitnami init script.
See 'systemctl status bitnami.service' for details.
[[0;1;31mFAILED[0m] Failed to start LSB: Start NTP daemon.
See 'systemctl status ntp.service' for details.
[[0;1;31mFAILED[0m] Failed to start Expand the root partition and filesystem on boot.
See 'systemctl status expand-root.service' for details.
[[0;32m  OK  [0m] Started Permit User Sessions.
[[0;1;31mFAILED[0m] Failed to start LSB: start and stop Stackdriver Agent.
See 'systemctl status stackdriver-agent.service' for details.

答案1

由于我找不到任何解决方案,我从上次备份中创建了另一个计算引擎实例,并重新输入了丢失的几天的数据。幸运的是,这是一个活动很少的网站服务器,否则唯一的其他解决方案就是采用谷歌昂贵的支持服务!

从日志中我发现谷歌本身存在一个严重故障,该故障严重损坏了磁盘,导致所有服务都无法启动。

我注意到触发点是当他们引入一些新的监控服务时,同时我的磁盘由于备份脚本失败而空间不足。所有这些都同时发生。这当然是猜测,因为我无法在没有任何系统访问权限的情况下准确识别问题。

答案2

我建议:

1- 对磁盘(无法通过 SSH 连接的实例)进行快照。2- 从该快照创建新磁盘,并确保新磁盘大于原始磁盘。3- 使用新启动磁盘创建新实例,并将快照磁盘作为附加磁盘附加。

如果问题仍然存在,您可以尝试通过串行端口进行连接。如果这仍然不起作用,可能需要运行启动脚本来卸载连接的磁盘,看看您是否可以连接到实例,然后重新安装它以访问数据。

相关内容