在 Server 2012R2 HPC 集群中通过 iSCSI 部署基本节点失败(无法加入域)

在 Server 2012R2 HPC 集群中通过 iSCSI 部署基本节点失败(无法加入域)

我们目前正在为即将开展的项目评估带有 HPC Pack 的 Server 2012R2。

遗憾的是,我们部署基础节点时遇到了问题。节点通过 PXE (iPXE) 启动并连接到 iSCSI,安装了 Windows,但似乎无法加入域。

一旦部署失败,节点将保留为安装在 iSCSI 驱动器上的 Windows。

然后,我们可以手动加入域并以域用户身份登录。我们可以通过两个 IP 来连接服务器。ping cluster.local 或 HEAD-NODE.cluster.local 可解析为专用网络的服务器 NIc 的 IP(.10.1)

运行部署:绑定顺序测试仅产生:

警告:企业网络在默认网关的绑定顺序中未首先配置。这可能会导致与 Active Directory 域服务通信时出现问题。


部署:绑定订单测试

14-7-2015 20:20:32 [Information] Network "Private" description: Intel(R) I210 Gigabit Network Connection #2
14-7-2015 20:20:32 [Information] Network interface type: Ethernet
14-7-2015 20:20:32 [Information] Address: 192.168.10.1
14-7-2015 20:20:32 [Information] Network "Enterprise" description: Intel(R) I210 Gigabit Network Connection
14-7-2015 20:20:32 [Information] Network interface type: Ethernet
14-7-2015 20:20:32 [Information] Address: 192.168.178.5
14-7-2015 20:20:32 [Information] Microsoft HPC Diagnostic Test Host.
14-7-2015 20:20:32 [Information] Creating test instances from: PDCNet.dll:Microsoft.Hpc.Diagnostics.Tests.BindingOrder.
14-7-2015 20:20:32 [Verbose] Failed to load assembly OriginalPath: Could not load file or assembly 'file:///\\HEAD-NODE\Diagnostics\1125\PDCNet.dll' or one of its dependencies. The system cannot find the file specified..
   at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
   at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
   at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
   at System.Reflection.RuntimeAssembly.InternalLoadFrom(String assemblyFile, Evidence securityEvidence, Byte[] hashValue, AssemblyHashAlgorithm hashAlgorithm, Boolean forIntrospection, Boolean suppressSecurityChecks, StackCrawlMark& stackMark)
   at System.Reflection.Assembly.LoadFrom(String assemblyFile)
   at Microsoft.Hpc.Diagnostics.TestHost.Program.CreateTestInstance(String assemblyName, String className)
14-7-2015 20:20:32 [Verbose] Retrying from location: C:\Program Files\Microsoft HPC Pack 2012\Bin\PDCNet.dll
14-7-2015 20:20:32 [Verbose] Failed to load assembly EXEPath: Could not load file or assembly 'file:///C:\Program Files\Microsoft HPC Pack 2012\Bin\PDCNet.dll' or one of its dependencies. The system cannot find the file specified..
   at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
   at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
   at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
   at System.Reflection.RuntimeAssembly.InternalLoadFrom(String assemblyFile, Evidence securityEvidence, Byte[] hashValue, AssemblyHashAlgorithm hashAlgorithm, Boolean forIntrospection, Boolean suppressSecurityChecks, StackCrawlMark& stackMark)
   at System.Reflection.Assembly.LoadFrom(String assemblyFile)
   at Microsoft.Hpc.Diagnostics.TestHost.Program.CreateTestInstance(String assemblyName, String className)
14-7-2015 20:20:32 [Verbose] Retrying from location: C:\Program Files\Microsoft HPC Pack 2012\Bin\DiagTests\PDCNet.dll
14-7-2015 20:20:32 [Verbose] Doing test on test type: Microsoft.Hpc.Diagnostics.Tests.BindingOrder.
14-7-2015 20:20:32 [Information] Got domain controller name: HEAD-NODE.cluster.local
14-7-2015 20:20:32 [Information] Resolved IP address for DC. Got 4 IP addessses.
14-7-2015 20:20:32 [Information] Routing to address: 192.168.10.1.

所以我们调查了这个问题。但是,一切似乎都很好。在高级设置中更改绑定顺序既不能解决测试问题,也不能解决实际部署问题。


NIC 设置

企业

Intel(R) I210 Gigabit Network Connection
cluster.local
IPv6 - disabled
IPv4
192.168.178.5
255.255.255.0
192.168.178.1
DNS
212.54.40.25
192.168.178.1
QoS Packed Scheduler - enabled
Link-Layer Topology Discovery Mapper I/O Driver - enabled
Link-Layer Topology Discovery Responder - enabled

私人的

Intel(R) I210 Gigabit Network Connection #2
cluster.local
IPv6 - disabled
IPv4
192.168.10.1
255.255.255.0
no default gateway
DNS
127.0.0.1
QoS Packed Scheduler - enabled
Link-Layer Topology Discovery Mapper I/O Driver - enabled
Link-Layer Topology Discovery Responder - enabled

控制面板\网络和 Internet\网络连接\高级\高级设置\适配器和绑定\连接

Enterprise
Private
[Remote Access connections]

安慰:

C:\Users\Administrator>wmic nicconfig get Description,SettingID
Description                                  SettingID
WAN Miniport (L2TP)                          {06E102F9-E21B-4CEF-B0CA-64F4829A9A7C}
WAN Miniport (SSTP)                          {577B93D0-F1FD-4C7B-B41E-53B4BA94A579}
WAN Miniport (IKEv2)                         {1AF75D00-449A-4CC1-9ED1-FB440172AED2}
WAN Miniport (PPTP)                          {A235D4B4-600A-4FFA-8E12-9BA09E6DAF65}
WAN Miniport (PPPOE)                         {4E1B3D6C-934D-43DF-9301-DA9CC9E8A407}
WAN Miniport (IP)                            {3F6E7537-F2F8-4AEA-8B72-B7A4D7298D4E}
WAN Miniport (IPv6)                          {041B181E-0469-42FD-B6B4-F32842B6495B}
WAN Miniport (Network Monitor)               {C9875A41-724D-4987-9F2D-A41F8AE84E2F}
Microsoft Kernel Debug Network Adapter       {C7568B63-C424-48B3-AB9B-6D1F004D5AFC}
RAS Async Adapter                            {D0FA3B2F-90BF-4A07-83E3-81165D8B28EF}
Intel(R) I210 Gigabit Network Connection     {5BC77C5E-0E79-4C01-9C9A-A28CFA94F898}
Intel(R) I210 Gigabit Network Connection #2  {0260FC73-5ABF-4814-AB52-D16DEFFA4875}
Microsoft ISATAP Adapter                     {266E5AE1-245A-4D47-9E73-F37AFFE434A4}
Microsoft Teredo Tunneling Adapter           {D9B3F7F2-A448-4513-BD48-A1B215C54DCF}
Microsoft ISATAP Adapter                     {40F0B15B-06DA-4E06-A43C-89AD60918DEC}
Microsoft ISATAP Adapter                     {DFA7713A-B5DC-4144-BB90-3A546C0EE42E}
Microsoft Failover Cluster Virtual Adapter   {341D1797-925C-49CC-9C24-50AFC0F0C105}
Microsoft ISATAP Adapter                     {8BE73B7C-14DE-4E5C-92FA-1F6C8CF0FBF6}
Hyper-V Virtual Ethernet Adapter             {A71BD6F0-072F-4C9A-8963-780EE41C896C}

注册表:[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Linkage]

\Device\{5BC77C5E-0E79-4C01-9C9A-A28CFA94F898}
\Device\{A71BD6F0-072F-4C9A-8963-780EE41C896C}
\Device\{341D1797-925C-49CC-9C24-50AFC0F0C105}
\Device\{0260FC73-5ABF-4814-AB52-D16DEFFA4875}
\Device\{C7568B63-C424-48B3-AB9B-6D1F004D5AFC}

硬件:

服务器

ASRock J1900D2Y, 16 GB, 3TB HDD
Windows Server 2012R2 Evaluation
Roles: AD DS, DHCP (for Private), DNS (for Private), WDS, HPC Cluster HEAD-NODE
LAN1: Enterprise, 192.168.178.5 (static)
LAN2: Private, 192.168.10.1 (static)
IPMI: 192.168.178.6 (static), 192.168.178.28 (DHCP)

基节点

ASRock Q1900TM-ITX, 4 GB, no HDD
Windows 8.1 Embedded
(the important thing is Quick-Sync)
LAN: Private, 192.168.10.6 (DHCP)

集群节点用于对大型视频的各个部分进行编码,并最终将这些部分连接起来。因此,我们节点的硬件要求是:Intel Quick-Sync、低功耗、小尺寸。


部署日志:

Time    Message
10-7-2015 23:57:18  Reverted
10-7-2015 23:57:17  Disassociating template from node CLUSTER\ENCODER1000
10-7-2015 23:57:17  The operation failed due to errors during execution.
10-7-2015 23:57:17  The operation failed and will not be retried.
10-7-2015 23:57:17  The operation failed due to errors during execution.
10-7-2015 23:57:17  The operation failed and will not be retried.
10-7-2015 23:57:17  Exit code 1: Incorrect function
10-7-2015 23:57:05  Joining domain: cluster.local
10-7-2015 23:57:03  Exit code 1: Incorrect function
10-7-2015 23:56:51  Joining domain: cluster.local
10-7-2015 23:56:49  Exit code 1: Incorrect function
10-7-2015 23:56:37  Joining domain: cluster.local
10-7-2015 23:56:35  Exit code 1: Incorrect function
10-7-2015 23:56:23  Joining domain: cluster.local
10-7-2015 23:56:21  Exit code 1: Incorrect function
10-7-2015 23:56:09  Joining domain: cluster.local
10-7-2015 23:56:07  Exit code 1: Incorrect function
10-7-2015 23:55:55  Joining domain: cluster.local
10-7-2015 23:55:53  Exit code 1: Incorrect function
10-7-2015 23:55:41  Joining domain: cluster.local
10-7-2015 23:55:39  Exit code 1: Incorrect function
10-7-2015 23:55:27  Joining domain: cluster.local
10-7-2015 23:55:25  Exit code 1: Incorrect function
10-7-2015 23:55:13  Joining domain: cluster.local
10-7-2015 23:55:10  Exit code 1: Incorrect function
10-7-2015 23:54:58  Joining domain: cluster.local
10-7-2015 23:54:56  Exit code 1: Incorrect function
10-7-2015 23:54:44  Joining domain: cluster.local
10-7-2015 23:54:42  Exit code 1: Incorrect function
10-7-2015 23:54:30  Joining domain: cluster.local
10-7-2015 23:54:28  Exit code 1: Incorrect function
10-7-2015 23:54:16  Joining domain: cluster.local
10-7-2015 23:54:14  Exit code 1: Incorrect function
10-7-2015 23:54:02  Joining domain: cluster.local
10-7-2015 23:54:00  Exit code 1: Incorrect function
10-7-2015 23:53:48  Joining domain: cluster.local
10-7-2015 23:53:46  Exit code 1: Incorrect function
10-7-2015 23:53:34  Joining domain: cluster.local
10-7-2015 23:53:32  Exit code 1: Incorrect function
10-7-2015 23:53:20  Joining domain: cluster.local
10-7-2015 23:53:18  Exit code 1: Incorrect function
10-7-2015 23:53:06  Joining domain: cluster.local
10-7-2015 23:53:04  Exit code 1: Incorrect function
10-7-2015 23:52:52  Joining domain: cluster.local
10-7-2015 23:52:50  Exit code 1: Incorrect function
10-7-2015 23:52:38  Joining domain: cluster.local
10-7-2015 23:52:36  Exit code 1: Incorrect function
10-7-2015 23:52:24  Joining domain: cluster.local
10-7-2015 23:52:22  Exit code 1: Incorrect function
10-7-2015 23:52:10  Joining domain: cluster.local
10-7-2015 23:52:08  Exit code 1: Incorrect function
10-7-2015 23:51:55  Joining domain: cluster.local
10-7-2015 23:51:53  Exit code 1: Incorrect function
10-7-2015 23:51:41  Joining domain: cluster.local
10-7-2015 23:51:39  Exit code 1: Incorrect function
10-7-2015 23:51:27  Joining domain: cluster.local
10-7-2015 23:51:25  Exit code 1: Incorrect function
10-7-2015 23:51:13  Joining domain: cluster.local
10-7-2015 23:51:11  Exit code 1: Incorrect function
10-7-2015 23:50:59  Joining domain: cluster.local
10-7-2015 23:50:57  Exit code 1: Incorrect function
10-7-2015 23:50:45  Joining domain: cluster.local
10-7-2015 23:50:43  Exit code 1: Incorrect function
10-7-2015 23:50:31  Joining domain: cluster.local
10-7-2015 23:50:29  Exit code 1: Incorrect function
10-7-2015 23:50:17  Joining domain: cluster.local
10-7-2015 23:50:15  Exit code 1: Incorrect function
10-7-2015 23:50:03  Joining domain: cluster.local
10-7-2015 23:50:01  Exit code 1: Incorrect function
10-7-2015 23:49:49  Joining domain: cluster.local
10-7-2015 23:49:47  Exit code 1: Incorrect function
10-7-2015 23:49:35  Joining domain: cluster.local
10-7-2015 23:49:33  Exit code 1: Incorrect function
10-7-2015 23:49:21  Joining domain: cluster.local
10-7-2015 23:49:19  Exit code 1: Incorrect function
10-7-2015 23:49:06  Joining domain: cluster.local
10-7-2015 23:49:04  Exit code 1: Incorrect function
10-7-2015 23:48:52  Joining domain: cluster.local
10-7-2015 23:48:50  Exit code 1: Incorrect function
10-7-2015 23:48:38  Joining domain: cluster.local
10-7-2015 23:48:36  Exit code 1: Incorrect function
10-7-2015 23:48:24  Joining domain: cluster.local
10-7-2015 23:48:22  Exit code 1: Incorrect function
10-7-2015 23:48:10  Joining domain: cluster.local
10-7-2015 23:48:08  Exit code 1: Incorrect function
10-7-2015 23:47:56  Joining domain: cluster.local
10-7-2015 23:47:54  Exit code 1: Incorrect function
10-7-2015 23:47:42  Joining domain: cluster.local
10-7-2015 23:47:40  Exit code 1: Incorrect function
10-7-2015 23:47:28  Joining domain: cluster.local
10-7-2015 23:47:26  Exit code 1: Incorrect function
10-7-2015 23:47:14  Joining domain: cluster.local
10-7-2015 23:47:12  Exit code 1: Incorrect function
10-7-2015 23:47:00  Joining domain: cluster.local
10-7-2015 23:46:58  Exit code 1: Incorrect function
10-7-2015 23:46:46  Joining domain: cluster.local
10-7-2015 23:46:44  Exit code 1: Incorrect function
10-7-2015 23:46:32  Joining domain: cluster.local
10-7-2015 23:46:30  Exit code 1: Incorrect function
10-7-2015 23:46:18  Joining domain: cluster.local
10-7-2015 23:46:16  Exit code 1: Incorrect function
10-7-2015 23:46:04  Joining domain: cluster.local
10-7-2015 23:46:02  Exit code 1: Incorrect function
10-7-2015 23:45:50  Joining domain: cluster.local
10-7-2015 23:45:48  Exit code 1: Incorrect function
10-7-2015 23:45:45  Joining domain: cluster.local
10-7-2015 23:45:38  Disabling Windows Recovery Mode (optimization for iSCSI-boot scenario)
10-7-2015 23:45:36  Waiting for iSCSI boot nodes to boot and start Windows setup
10-7-2015 23:32:48  Sending PXE command to boot node to the current OS
10-7-2015 23:32:44  Sending PXE command to boot node to the current OS
10-7-2015 23:29:19  Sending PXE command to boot node to the current OS
10-7-2015 23:29:15  Sending PXE command to boot node to the current OS
10-7-2015 23:06:48  Installing Windows (Expected time: 30 minutes)
10-7-2015 23:06:43  Customizing the Windows unattended installation script
10-7-2015 23:06:37  Cleaning up WIM file
10-7-2015 23:02:16  Extracting WIM C:\Win8.1 embedded industry.WIM to C:\Install
10-7-2015 23:02:11  Creating local directory for install media
10-7-2015 22:58:17  Copying: Images\Win8.1 embedded industry.WIM
10-7-2015 22:57:53  Configuring disk partitions
10-7-2015 22:57:48  Copying: config\diskpart.txt
10-7-2015 22:57:42  Mounting the installation shared folder on the head node
10-7-2015 22:54:32  Sending PXE command to boot node to WINPE (Expected boot time: 5-15 minutes)
10-7-2015 22:54:22  Sending PXE command to boot node to WINPE (Expected boot time: 5-15 minutes)
10-7-2015 22:54:17  Waiting for node to boot into WINPE
10-7-2015 22:54:17  Initiating configuration operations for template: Default Base Template
10-7-2015 22:54:17  Computer account ENCODER1000 created
10-7-2015 22:54:17  The computer account ENCODER1000 does not exist; creating a new account in Active Directory.
10-7-2015 22:54:17  Searching for an existing account in Active Directory
10-7-2015 22:54:16  Connecting to domain controller: cluster.local
10-7-2015 22:54:16  Initiating provisioning operations for template: Default Base Template
10-7-2015 22:54:16  Creating DHCP reservation 192.168.10.6 on scope 192.168.10.0
10-7-2015 22:54:16  Setting DHCP option 17 to iscsi:192.168.10.1::::iqn.1991-05.com.microsoft:head-node-encoder1000-base-target
10-7-2015 22:54:16  Setting DHCP option 12 to encoder1000
10-7-2015 22:54:16  Setting DHCP option 203 to iqn.1991-05.com.microsoft:encoder1000.cluster.local
10-7-2015 22:54:16  Creating a reservation for network adapter: D0509947B72C
10-7-2015 22:54:16  Mapping successful
10-7-2015 22:54:16  Mapping client [iqn.1991-05.com.microsoft:encoder1000.cluster.local] to target LUN [ENCODER1000-BASE] on storage array [127.0.0.1]
10-7-2015 22:54:15  Remote disk disconnected
10-7-2015 22:53:44  Disconnecting remote disk
10-7-2015 22:53:34  Configuring bootloader
10-7-2015 22:53:31  Copying WinPE files
10-7-2015 22:53:29  Placing bootloader
10-7-2015 22:53:29  Mount successful at: C:\Windows\TEMP\uhusex01.bsn
10-7-2015 22:53:16  Connection established, mounting disk
10-7-2015 22:53:15  Connecting to iSCSI target: 192.168.10.1 / iqn.1991-05.com.microsoft:head-node-encoder1000-base-target
10-7-2015 22:53:15  Mapping successful
10-7-2015 22:53:14  Mapping client [iqn.1991-05.com.microsoft:HEAD-NODE.cluster.local] to target LUN [ENCODER1000-BASE] on storage array [127.0.0.1]
10-7-2015 22:53:14  Base LUN creation complete
10-7-2015 22:53:14  Creation successful
10-7-2015 22:53:13  Creating base LUN "ENCODER1000-BASE"
10-7-2015 22:53:12  Setting boot-initiator information
10-7-2015 22:53:12  Associating template Default Base Template with node CLUSTER\ENCODER1000
10-7-2015 22:53:12  Moving node CLUSTER\ENCODER1000 from state Unknown to state Provisioning
10-7-2015 22:53:11  Assigning template Default Base Template to node ENCODER1000

问题:

  1. 有人可以给我们指明正确的方向吗?
  2. 为什么测试显示绑定顺序不正确,但是看起来一切正常?看起来这是硬件和 Windows 之间的问题
  3. 我如何解决绑定顺序并使测试通过?切换网卡
  4. 这真是一个问题吗?显然不是
  5. 如果没有的话,我该如何缩小范围或者有人能直接告诉我问题是什么吗?

我们已经没有什么主意可以尝试了。


更新 1:

我们调换了两个 NIC 连接器(电缆、设置、绑定顺序)。难以置信,但这确实解决了测试问题。遗憾的是,这并没有解决实际问题。加入域(部署期间)仍然失败,而我们可以手动加入(这将取消部署)。

相关内容