间歇性 Windows Server 2008 BSOD 和重启

间歇性 Windows Server 2008 BSOD 和重启

我们的 EC2 实例 (Windows Server 2008) 在过去 3 个月内多次崩溃(最后一次是今天 1:05 EST)。在检查 MEMORY.DMP 文件后,我们注意到崩溃的可能原因是 rhelnet.sys(RedHat PV NIC 驱动程序)。

崩溃发生后,服务器的事件查看器有以下记录:

Critical - Kernel Power:
The system has rebooted without cleanly shutting down first. 
This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

BugCheck:
The computer has rebooted from a bugcheck.  The bugcheck was:
0x000000d1 (0x000000000000002d, 0x0000000000000002, 0x0000000000000000, 0xfffff88001402d14). 
A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 100113-35849-01.

这可能是硬件问题吗?如果我们停止并启动实例会有帮助吗?或者这更可能是由系统上运行的软件引起的?

[2013 年 1 月 10 日更新]

Amazon 代表建议在我们的实例上将 RH 驱动程序更新为 Citrix PV 驱动程序:

升级 PV 驱动程序

[2013 年 8 月 10 日更新]

我们对克隆的实例执行了驱动程序升级。升级后,我们在事件查看器中注意到以下错误:

Xennet6 errors in Event Viewer (Event ID# 5001)

经过进一步挖掘,我发现文章建议安装最新的 Citrix 驱动程序。不幸的是,这根本没有帮助,我们的克隆实例变得无响应。

[2013.08.10更新 2]

我重新创建了一个实例并再次更新了 PV 驱动程序。在网上搜索后,我发现亚马逊销售代表在文章中解释说:

"Event ID 5001 from source Xennet6 cannot be found" message does not 
indicate anything wrong, just that the PV driver is looking for a feature
that we have not implemented in our version of Xen. 

我会让测试系统运行一段时间,看看是否存在任何问题。

答案1

按照亚马逊代表的建议升级驱动程序解决了该问题。

关于这个Event ID 5001...问题,以下是我从亚马逊得到的答复:

Please ignore the Xennet 5001 error. This error occurs on every instance
that is launched with Citrix PV drivers and is due to the driver looking
for a feature that is not supported on EC2. It will have no other effect on the instance.

答案2

我遇到了同样的问题。

但是 AWS Supporter 给我的回答如下:他们不确定 Citrix PV 驱动器是否存在问题。

Currently, we are unable to root cause the issue.
In my personal opinion, this might be a one-time only occurrence,
but as you are running Citrix PV Drivers, I highly encourage you to upgrade.

As the Citrix drivers show up in the logs,
they might had been related to the issue.

相关内容