SmartOS 自发重启

SmartOS 自发重启

我在 Hetzner EX4S(Intel Core i7-2600、32G RAM、2x3Tb SATA HDD)上运行 SmartOS 系统。主机上有六个虚拟机:

[root@10-bf-48-7f-e7-03 ~]# vmadm list
UUID                                  TYPE  RAM      STATE             ALIAS
d2223467-bbe5-4b81-a9d1-439e9a66d43f  KVM   512      running           xxxx1
5f36358f-68fa-4351-b66f-830484b9a6ee  KVM   1024     running           xxxx2
d570e9ac-9eac-4e4f-8fda-2b1d721c8358  OS    1024     running           xxxx3
ef88979e-fb7f-460c-bf56-905755e0a399  KVM   1024     running           xxxx4
d8e06def-c9c9-4d17-b975-47dd4836f962  KVM   4096     running           xxxx5
4b06fe88-db6e-4cf3-aadd-e1006ada7188  KVM   9216     running           xxxx5
[root@10-bf-48-7f-e7-03 ~]#

主机每周重启数次,没有崩溃转储/var/crash,日志中也没有消息/var/adm/messages。基本上/var/adm/messages看起来像是进行了硬重置:

2012-11-23T08:54:43.210625+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:14:43.187589+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:34:43.165100+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:54:43.142065+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:14:43.119365+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:34:43.096351+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:54:43.073821+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:57:55.610954+00:00 10-bf-48-7f-e7-03 genunix: [ID 540533 kern.notice] #015SunOS Release 5.11 Version joyent_20121018T224723Z 64-bit
2012-11-23T10:57:55.610962+00:00 10-bf-48-7f-e7-03 genunix: [ID 299592 kern.notice] Copyright (c) 2010-2012, Joyent Inc. All rights reserved.
2012-11-23T10:57:55.610967+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: lgpg
2012-11-23T10:57:55.610971+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: tsc
2012-11-23T10:57:55.610974+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: msr
2012-11-23T10:57:55.610978+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mtrr
2012-11-23T10:57:55.610981+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pge
2012-11-23T10:57:55.610984+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: de
2012-11-23T10:57:55.610987+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cmov
2012-11-23T10:57:55.610995+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mmx
2012-11-23T10:57:55.611000+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mca
2012-11-23T10:57:55.611004+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pae
2012-11-23T10:57:55.611008+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cv8

问题是,有时主机在重新启动时会丢失网络接口,因此我们需要执行手动硬件重置以将其恢复。我们无法物理或虚拟访问服务器控制台 - 没有 KVM、没有 iLO 或类似的东西。因此,调试的唯一方法是分析崩溃转储/日志文件。我不是 SmartOS/Solaris 专家,所以我不确定如何继续。SmartOS 是否有任何相当于 Linux 网络控制台的东西?我能否以某种方式将控制台输出重定向到网络端口?也许我遗漏了一些明显的东西,崩溃信息位于其他地方。

答案1

运行命令dumpadm来检查崩溃转储是否已启用以及在哪个设备上启用。

如果已启用该功能但您未发现任何崩溃转储,则怀疑存在硬件故障,并要求托管公司将您转移到其他物理服务器。(他们还可以检查硬件日志和故障指示灯并致电供应商等。)

相关内容