我们目前正在为即将开展的项目评估带有 HPC Pack 的 Server 2012R2。
遗憾的是,我们部署基础节点时遇到了问题。节点通过 PXE (iPXE) 启动并连接到 iSCSI,安装了 Windows,但似乎无法加入域。
一旦部署失败,节点将保留为安装在 iSCSI 驱动器上的 Windows。
然后,我们可以手动加入域并以域用户身份登录。我们可以通过两个 IP 来连接服务器。ping cluster.local 或 HEAD-NODE.cluster.local 可解析为专用网络的服务器 NIc 的 IP(.10.1)
运行部署:绑定顺序测试仅产生:
警告:企业网络在默认网关的绑定顺序中未首先配置。这可能会导致与 Active Directory 域服务通信时出现问题。
部署:绑定订单测试
14-7-2015 20:20:32 [Information] Network "Private" description: Intel(R) I210 Gigabit Network Connection #2
14-7-2015 20:20:32 [Information] Network interface type: Ethernet
14-7-2015 20:20:32 [Information] Address: 192.168.10.1
14-7-2015 20:20:32 [Information] Network "Enterprise" description: Intel(R) I210 Gigabit Network Connection
14-7-2015 20:20:32 [Information] Network interface type: Ethernet
14-7-2015 20:20:32 [Information] Address: 192.168.178.5
14-7-2015 20:20:32 [Information] Microsoft HPC Diagnostic Test Host.
14-7-2015 20:20:32 [Information] Creating test instances from: PDCNet.dll:Microsoft.Hpc.Diagnostics.Tests.BindingOrder.
14-7-2015 20:20:32 [Verbose] Failed to load assembly OriginalPath: Could not load file or assembly 'file:///\\HEAD-NODE\Diagnostics\1125\PDCNet.dll' or one of its dependencies. The system cannot find the file specified..
at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadFrom(String assemblyFile, Evidence securityEvidence, Byte[] hashValue, AssemblyHashAlgorithm hashAlgorithm, Boolean forIntrospection, Boolean suppressSecurityChecks, StackCrawlMark& stackMark)
at System.Reflection.Assembly.LoadFrom(String assemblyFile)
at Microsoft.Hpc.Diagnostics.TestHost.Program.CreateTestInstance(String assemblyName, String className)
14-7-2015 20:20:32 [Verbose] Retrying from location: C:\Program Files\Microsoft HPC Pack 2012\Bin\PDCNet.dll
14-7-2015 20:20:32 [Verbose] Failed to load assembly EXEPath: Could not load file or assembly 'file:///C:\Program Files\Microsoft HPC Pack 2012\Bin\PDCNet.dll' or one of its dependencies. The system cannot find the file specified..
at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadFrom(String assemblyFile, Evidence securityEvidence, Byte[] hashValue, AssemblyHashAlgorithm hashAlgorithm, Boolean forIntrospection, Boolean suppressSecurityChecks, StackCrawlMark& stackMark)
at System.Reflection.Assembly.LoadFrom(String assemblyFile)
at Microsoft.Hpc.Diagnostics.TestHost.Program.CreateTestInstance(String assemblyName, String className)
14-7-2015 20:20:32 [Verbose] Retrying from location: C:\Program Files\Microsoft HPC Pack 2012\Bin\DiagTests\PDCNet.dll
14-7-2015 20:20:32 [Verbose] Doing test on test type: Microsoft.Hpc.Diagnostics.Tests.BindingOrder.
14-7-2015 20:20:32 [Information] Got domain controller name: HEAD-NODE.cluster.local
14-7-2015 20:20:32 [Information] Resolved IP address for DC. Got 4 IP addessses.
14-7-2015 20:20:32 [Information] Routing to address: 192.168.10.1.
所以我们调查了这个问题。但是,一切似乎都很好。在高级设置中更改绑定顺序既不能解决测试问题,也不能解决实际部署问题。
NIC 设置
企业
Intel(R) I210 Gigabit Network Connection
cluster.local
IPv6 - disabled
IPv4
192.168.178.5
255.255.255.0
192.168.178.1
DNS
212.54.40.25
192.168.178.1
QoS Packed Scheduler - enabled
Link-Layer Topology Discovery Mapper I/O Driver - enabled
Link-Layer Topology Discovery Responder - enabled
私人的
Intel(R) I210 Gigabit Network Connection #2
cluster.local
IPv6 - disabled
IPv4
192.168.10.1
255.255.255.0
no default gateway
DNS
127.0.0.1
QoS Packed Scheduler - enabled
Link-Layer Topology Discovery Mapper I/O Driver - enabled
Link-Layer Topology Discovery Responder - enabled
控制面板\网络和 Internet\网络连接\高级\高级设置\适配器和绑定\连接
Enterprise
Private
[Remote Access connections]
安慰:
C:\Users\Administrator>wmic nicconfig get Description,SettingID
Description SettingID
WAN Miniport (L2TP) {06E102F9-E21B-4CEF-B0CA-64F4829A9A7C}
WAN Miniport (SSTP) {577B93D0-F1FD-4C7B-B41E-53B4BA94A579}
WAN Miniport (IKEv2) {1AF75D00-449A-4CC1-9ED1-FB440172AED2}
WAN Miniport (PPTP) {A235D4B4-600A-4FFA-8E12-9BA09E6DAF65}
WAN Miniport (PPPOE) {4E1B3D6C-934D-43DF-9301-DA9CC9E8A407}
WAN Miniport (IP) {3F6E7537-F2F8-4AEA-8B72-B7A4D7298D4E}
WAN Miniport (IPv6) {041B181E-0469-42FD-B6B4-F32842B6495B}
WAN Miniport (Network Monitor) {C9875A41-724D-4987-9F2D-A41F8AE84E2F}
Microsoft Kernel Debug Network Adapter {C7568B63-C424-48B3-AB9B-6D1F004D5AFC}
RAS Async Adapter {D0FA3B2F-90BF-4A07-83E3-81165D8B28EF}
Intel(R) I210 Gigabit Network Connection {5BC77C5E-0E79-4C01-9C9A-A28CFA94F898}
Intel(R) I210 Gigabit Network Connection #2 {0260FC73-5ABF-4814-AB52-D16DEFFA4875}
Microsoft ISATAP Adapter {266E5AE1-245A-4D47-9E73-F37AFFE434A4}
Microsoft Teredo Tunneling Adapter {D9B3F7F2-A448-4513-BD48-A1B215C54DCF}
Microsoft ISATAP Adapter {40F0B15B-06DA-4E06-A43C-89AD60918DEC}
Microsoft ISATAP Adapter {DFA7713A-B5DC-4144-BB90-3A546C0EE42E}
Microsoft Failover Cluster Virtual Adapter {341D1797-925C-49CC-9C24-50AFC0F0C105}
Microsoft ISATAP Adapter {8BE73B7C-14DE-4E5C-92FA-1F6C8CF0FBF6}
Hyper-V Virtual Ethernet Adapter {A71BD6F0-072F-4C9A-8963-780EE41C896C}
注册表:[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Linkage]
\Device\{5BC77C5E-0E79-4C01-9C9A-A28CFA94F898}
\Device\{A71BD6F0-072F-4C9A-8963-780EE41C896C}
\Device\{341D1797-925C-49CC-9C24-50AFC0F0C105}
\Device\{0260FC73-5ABF-4814-AB52-D16DEFFA4875}
\Device\{C7568B63-C424-48B3-AB9B-6D1F004D5AFC}
硬件:
服务器
ASRock J1900D2Y, 16 GB, 3TB HDD
Windows Server 2012R2 Evaluation
Roles: AD DS, DHCP (for Private), DNS (for Private), WDS, HPC Cluster HEAD-NODE
LAN1: Enterprise, 192.168.178.5 (static)
LAN2: Private, 192.168.10.1 (static)
IPMI: 192.168.178.6 (static), 192.168.178.28 (DHCP)
基节点
ASRock Q1900TM-ITX, 4 GB, no HDD
Windows 8.1 Embedded
(the important thing is Quick-Sync)
LAN: Private, 192.168.10.6 (DHCP)
集群节点用于对大型视频的各个部分进行编码,并最终将这些部分连接起来。因此,我们节点的硬件要求是:Intel Quick-Sync、低功耗、小尺寸。
部署日志:
Time Message
10-7-2015 23:57:18 Reverted
10-7-2015 23:57:17 Disassociating template from node CLUSTER\ENCODER1000
10-7-2015 23:57:17 The operation failed due to errors during execution.
10-7-2015 23:57:17 The operation failed and will not be retried.
10-7-2015 23:57:17 The operation failed due to errors during execution.
10-7-2015 23:57:17 The operation failed and will not be retried.
10-7-2015 23:57:17 Exit code 1: Incorrect function
10-7-2015 23:57:05 Joining domain: cluster.local
10-7-2015 23:57:03 Exit code 1: Incorrect function
10-7-2015 23:56:51 Joining domain: cluster.local
10-7-2015 23:56:49 Exit code 1: Incorrect function
10-7-2015 23:56:37 Joining domain: cluster.local
10-7-2015 23:56:35 Exit code 1: Incorrect function
10-7-2015 23:56:23 Joining domain: cluster.local
10-7-2015 23:56:21 Exit code 1: Incorrect function
10-7-2015 23:56:09 Joining domain: cluster.local
10-7-2015 23:56:07 Exit code 1: Incorrect function
10-7-2015 23:55:55 Joining domain: cluster.local
10-7-2015 23:55:53 Exit code 1: Incorrect function
10-7-2015 23:55:41 Joining domain: cluster.local
10-7-2015 23:55:39 Exit code 1: Incorrect function
10-7-2015 23:55:27 Joining domain: cluster.local
10-7-2015 23:55:25 Exit code 1: Incorrect function
10-7-2015 23:55:13 Joining domain: cluster.local
10-7-2015 23:55:10 Exit code 1: Incorrect function
10-7-2015 23:54:58 Joining domain: cluster.local
10-7-2015 23:54:56 Exit code 1: Incorrect function
10-7-2015 23:54:44 Joining domain: cluster.local
10-7-2015 23:54:42 Exit code 1: Incorrect function
10-7-2015 23:54:30 Joining domain: cluster.local
10-7-2015 23:54:28 Exit code 1: Incorrect function
10-7-2015 23:54:16 Joining domain: cluster.local
10-7-2015 23:54:14 Exit code 1: Incorrect function
10-7-2015 23:54:02 Joining domain: cluster.local
10-7-2015 23:54:00 Exit code 1: Incorrect function
10-7-2015 23:53:48 Joining domain: cluster.local
10-7-2015 23:53:46 Exit code 1: Incorrect function
10-7-2015 23:53:34 Joining domain: cluster.local
10-7-2015 23:53:32 Exit code 1: Incorrect function
10-7-2015 23:53:20 Joining domain: cluster.local
10-7-2015 23:53:18 Exit code 1: Incorrect function
10-7-2015 23:53:06 Joining domain: cluster.local
10-7-2015 23:53:04 Exit code 1: Incorrect function
10-7-2015 23:52:52 Joining domain: cluster.local
10-7-2015 23:52:50 Exit code 1: Incorrect function
10-7-2015 23:52:38 Joining domain: cluster.local
10-7-2015 23:52:36 Exit code 1: Incorrect function
10-7-2015 23:52:24 Joining domain: cluster.local
10-7-2015 23:52:22 Exit code 1: Incorrect function
10-7-2015 23:52:10 Joining domain: cluster.local
10-7-2015 23:52:08 Exit code 1: Incorrect function
10-7-2015 23:51:55 Joining domain: cluster.local
10-7-2015 23:51:53 Exit code 1: Incorrect function
10-7-2015 23:51:41 Joining domain: cluster.local
10-7-2015 23:51:39 Exit code 1: Incorrect function
10-7-2015 23:51:27 Joining domain: cluster.local
10-7-2015 23:51:25 Exit code 1: Incorrect function
10-7-2015 23:51:13 Joining domain: cluster.local
10-7-2015 23:51:11 Exit code 1: Incorrect function
10-7-2015 23:50:59 Joining domain: cluster.local
10-7-2015 23:50:57 Exit code 1: Incorrect function
10-7-2015 23:50:45 Joining domain: cluster.local
10-7-2015 23:50:43 Exit code 1: Incorrect function
10-7-2015 23:50:31 Joining domain: cluster.local
10-7-2015 23:50:29 Exit code 1: Incorrect function
10-7-2015 23:50:17 Joining domain: cluster.local
10-7-2015 23:50:15 Exit code 1: Incorrect function
10-7-2015 23:50:03 Joining domain: cluster.local
10-7-2015 23:50:01 Exit code 1: Incorrect function
10-7-2015 23:49:49 Joining domain: cluster.local
10-7-2015 23:49:47 Exit code 1: Incorrect function
10-7-2015 23:49:35 Joining domain: cluster.local
10-7-2015 23:49:33 Exit code 1: Incorrect function
10-7-2015 23:49:21 Joining domain: cluster.local
10-7-2015 23:49:19 Exit code 1: Incorrect function
10-7-2015 23:49:06 Joining domain: cluster.local
10-7-2015 23:49:04 Exit code 1: Incorrect function
10-7-2015 23:48:52 Joining domain: cluster.local
10-7-2015 23:48:50 Exit code 1: Incorrect function
10-7-2015 23:48:38 Joining domain: cluster.local
10-7-2015 23:48:36 Exit code 1: Incorrect function
10-7-2015 23:48:24 Joining domain: cluster.local
10-7-2015 23:48:22 Exit code 1: Incorrect function
10-7-2015 23:48:10 Joining domain: cluster.local
10-7-2015 23:48:08 Exit code 1: Incorrect function
10-7-2015 23:47:56 Joining domain: cluster.local
10-7-2015 23:47:54 Exit code 1: Incorrect function
10-7-2015 23:47:42 Joining domain: cluster.local
10-7-2015 23:47:40 Exit code 1: Incorrect function
10-7-2015 23:47:28 Joining domain: cluster.local
10-7-2015 23:47:26 Exit code 1: Incorrect function
10-7-2015 23:47:14 Joining domain: cluster.local
10-7-2015 23:47:12 Exit code 1: Incorrect function
10-7-2015 23:47:00 Joining domain: cluster.local
10-7-2015 23:46:58 Exit code 1: Incorrect function
10-7-2015 23:46:46 Joining domain: cluster.local
10-7-2015 23:46:44 Exit code 1: Incorrect function
10-7-2015 23:46:32 Joining domain: cluster.local
10-7-2015 23:46:30 Exit code 1: Incorrect function
10-7-2015 23:46:18 Joining domain: cluster.local
10-7-2015 23:46:16 Exit code 1: Incorrect function
10-7-2015 23:46:04 Joining domain: cluster.local
10-7-2015 23:46:02 Exit code 1: Incorrect function
10-7-2015 23:45:50 Joining domain: cluster.local
10-7-2015 23:45:48 Exit code 1: Incorrect function
10-7-2015 23:45:45 Joining domain: cluster.local
10-7-2015 23:45:38 Disabling Windows Recovery Mode (optimization for iSCSI-boot scenario)
10-7-2015 23:45:36 Waiting for iSCSI boot nodes to boot and start Windows setup
10-7-2015 23:32:48 Sending PXE command to boot node to the current OS
10-7-2015 23:32:44 Sending PXE command to boot node to the current OS
10-7-2015 23:29:19 Sending PXE command to boot node to the current OS
10-7-2015 23:29:15 Sending PXE command to boot node to the current OS
10-7-2015 23:06:48 Installing Windows (Expected time: 30 minutes)
10-7-2015 23:06:43 Customizing the Windows unattended installation script
10-7-2015 23:06:37 Cleaning up WIM file
10-7-2015 23:02:16 Extracting WIM C:\Win8.1 embedded industry.WIM to C:\Install
10-7-2015 23:02:11 Creating local directory for install media
10-7-2015 22:58:17 Copying: Images\Win8.1 embedded industry.WIM
10-7-2015 22:57:53 Configuring disk partitions
10-7-2015 22:57:48 Copying: config\diskpart.txt
10-7-2015 22:57:42 Mounting the installation shared folder on the head node
10-7-2015 22:54:32 Sending PXE command to boot node to WINPE (Expected boot time: 5-15 minutes)
10-7-2015 22:54:22 Sending PXE command to boot node to WINPE (Expected boot time: 5-15 minutes)
10-7-2015 22:54:17 Waiting for node to boot into WINPE
10-7-2015 22:54:17 Initiating configuration operations for template: Default Base Template
10-7-2015 22:54:17 Computer account ENCODER1000 created
10-7-2015 22:54:17 The computer account ENCODER1000 does not exist; creating a new account in Active Directory.
10-7-2015 22:54:17 Searching for an existing account in Active Directory
10-7-2015 22:54:16 Connecting to domain controller: cluster.local
10-7-2015 22:54:16 Initiating provisioning operations for template: Default Base Template
10-7-2015 22:54:16 Creating DHCP reservation 192.168.10.6 on scope 192.168.10.0
10-7-2015 22:54:16 Setting DHCP option 17 to iscsi:192.168.10.1::::iqn.1991-05.com.microsoft:head-node-encoder1000-base-target
10-7-2015 22:54:16 Setting DHCP option 12 to encoder1000
10-7-2015 22:54:16 Setting DHCP option 203 to iqn.1991-05.com.microsoft:encoder1000.cluster.local
10-7-2015 22:54:16 Creating a reservation for network adapter: D0509947B72C
10-7-2015 22:54:16 Mapping successful
10-7-2015 22:54:16 Mapping client [iqn.1991-05.com.microsoft:encoder1000.cluster.local] to target LUN [ENCODER1000-BASE] on storage array [127.0.0.1]
10-7-2015 22:54:15 Remote disk disconnected
10-7-2015 22:53:44 Disconnecting remote disk
10-7-2015 22:53:34 Configuring bootloader
10-7-2015 22:53:31 Copying WinPE files
10-7-2015 22:53:29 Placing bootloader
10-7-2015 22:53:29 Mount successful at: C:\Windows\TEMP\uhusex01.bsn
10-7-2015 22:53:16 Connection established, mounting disk
10-7-2015 22:53:15 Connecting to iSCSI target: 192.168.10.1 / iqn.1991-05.com.microsoft:head-node-encoder1000-base-target
10-7-2015 22:53:15 Mapping successful
10-7-2015 22:53:14 Mapping client [iqn.1991-05.com.microsoft:HEAD-NODE.cluster.local] to target LUN [ENCODER1000-BASE] on storage array [127.0.0.1]
10-7-2015 22:53:14 Base LUN creation complete
10-7-2015 22:53:14 Creation successful
10-7-2015 22:53:13 Creating base LUN "ENCODER1000-BASE"
10-7-2015 22:53:12 Setting boot-initiator information
10-7-2015 22:53:12 Associating template Default Base Template with node CLUSTER\ENCODER1000
10-7-2015 22:53:12 Moving node CLUSTER\ENCODER1000 from state Unknown to state Provisioning
10-7-2015 22:53:11 Assigning template Default Base Template to node ENCODER1000
问题:
- 有人可以给我们指明正确的方向吗?
为什么测试显示绑定顺序不正确,但是看起来一切正常?看起来这是硬件和 Windows 之间的问题我如何解决绑定顺序并使测试通过?切换网卡这真是一个问题吗?显然不是- 如果没有的话,我该如何缩小范围或者有人能直接告诉我问题是什么吗?
我们已经没有什么主意可以尝试了。
更新 1:
我们调换了两个 NIC 连接器(电缆、设置、绑定顺序)。难以置信,但这确实解决了测试问题。遗憾的是,这并没有解决实际问题。加入域(部署期间)仍然失败,而我们可以手动加入(这将取消部署)。