我在 Hetzner EX4S(Intel Core i7-2600、32G RAM、2x3Tb SATA HDD)上运行 SmartOS 系统。主机上有六个虚拟机:
[root@10-bf-48-7f-e7-03 ~]# vmadm list
UUID TYPE RAM STATE ALIAS
d2223467-bbe5-4b81-a9d1-439e9a66d43f KVM 512 running xxxx1
5f36358f-68fa-4351-b66f-830484b9a6ee KVM 1024 running xxxx2
d570e9ac-9eac-4e4f-8fda-2b1d721c8358 OS 1024 running xxxx3
ef88979e-fb7f-460c-bf56-905755e0a399 KVM 1024 running xxxx4
d8e06def-c9c9-4d17-b975-47dd4836f962 KVM 4096 running xxxx5
4b06fe88-db6e-4cf3-aadd-e1006ada7188 KVM 9216 running xxxx5
[root@10-bf-48-7f-e7-03 ~]#
主机每周重启数次,没有崩溃转储/var/crash
,日志中也没有消息/var/adm/messages
。基本上/var/adm/messages
看起来像是进行了硬重置:
2012-11-23T08:54:43.210625+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:14:43.187589+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:34:43.165100+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:54:43.142065+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:14:43.119365+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:34:43.096351+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:54:43.073821+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:57:55.610954+00:00 10-bf-48-7f-e7-03 genunix: [ID 540533 kern.notice] #015SunOS Release 5.11 Version joyent_20121018T224723Z 64-bit
2012-11-23T10:57:55.610962+00:00 10-bf-48-7f-e7-03 genunix: [ID 299592 kern.notice] Copyright (c) 2010-2012, Joyent Inc. All rights reserved.
2012-11-23T10:57:55.610967+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: lgpg
2012-11-23T10:57:55.610971+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: tsc
2012-11-23T10:57:55.610974+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: msr
2012-11-23T10:57:55.610978+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mtrr
2012-11-23T10:57:55.610981+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pge
2012-11-23T10:57:55.610984+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: de
2012-11-23T10:57:55.610987+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cmov
2012-11-23T10:57:55.610995+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mmx
2012-11-23T10:57:55.611000+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mca
2012-11-23T10:57:55.611004+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pae
2012-11-23T10:57:55.611008+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cv8
问题是,有时主机在重新启动时会丢失网络接口,因此我们需要执行手动硬件重置以将其恢复。我们无法物理或虚拟访问服务器控制台 - 没有 KVM、没有 iLO 或类似的东西。因此,调试的唯一方法是分析崩溃转储/日志文件。我不是 SmartOS/Solaris 专家,所以我不确定如何继续。SmartOS 是否有任何相当于 Linux 网络控制台的东西?我能否以某种方式将控制台输出重定向到网络端口?也许我遗漏了一些明显的东西,崩溃信息位于其他地方。
答案1
运行命令dumpadm
来检查崩溃转储是否已启用以及在哪个设备上启用。
如果已启用该功能但您未发现任何崩溃转储,则怀疑存在硬件故障,并要求托管公司将您转移到其他物理服务器。(他们还可以检查硬件日志和故障指示灯并致电供应商等。)