我们的 EC2 实例 (Windows Server 2008) 在过去 3 个月内多次崩溃(最后一次是今天 1:05 EST)。在检查 MEMORY.DMP 文件后,我们注意到崩溃的可能原因是 rhelnet.sys(RedHat PV NIC 驱动程序)。
崩溃发生后,服务器的事件查看器有以下记录:
Critical - Kernel Power:
The system has rebooted without cleanly shutting down first.
This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
BugCheck:
The computer has rebooted from a bugcheck. The bugcheck was:
0x000000d1 (0x000000000000002d, 0x0000000000000002, 0x0000000000000000, 0xfffff88001402d14).
A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 100113-35849-01.
这可能是硬件问题吗?如果我们停止并启动实例会有帮助吗?或者这更可能是由系统上运行的软件引起的?
[2013 年 1 月 10 日更新]
Amazon 代表建议在我们的实例上将 RH 驱动程序更新为 Citrix PV 驱动程序:
[2013 年 8 月 10 日更新]
我们对克隆的实例执行了驱动程序升级。升级后,我们在事件查看器中注意到以下错误:
Xennet6 errors in Event Viewer (Event ID# 5001)
经过进一步挖掘,我发现这文章建议安装最新的 Citrix 驱动程序。不幸的是,这根本没有帮助,我们的克隆实例变得无响应。
[2013.08.10更新 2]
我重新创建了一个实例并再次更新了 PV 驱动程序。在网上搜索后,我发现这亚马逊销售代表在文章中解释说:
"Event ID 5001 from source Xennet6 cannot be found" message does not
indicate anything wrong, just that the PV driver is looking for a feature
that we have not implemented in our version of Xen.
我会让测试系统运行一段时间,看看是否存在任何问题。
答案1
按照亚马逊代表的建议升级驱动程序解决了该问题。
关于这个Event ID 5001...
问题,以下是我从亚马逊得到的答复:
Please ignore the Xennet 5001 error. This error occurs on every instance
that is launched with Citrix PV drivers and is due to the driver looking
for a feature that is not supported on EC2. It will have no other effect on the instance.
答案2
我遇到了同样的问题。
但是 AWS Supporter 给我的回答如下:他们不确定 Citrix PV 驱动器是否存在问题。
Currently, we are unable to root cause the issue.
In my personal opinion, this might be a one-time only occurrence,
but as you are running Citrix PV Drivers, I highly encourage you to upgrade.
As the Citrix drivers show up in the logs,
they might had been related to the issue.