我有两台计算机,它们都装有 Infiniband 卡,我想在不使用交换机的情况下连接它们。我有一条电缆通过它们的 QSFP 端口连接这两台计算机。
我已阅读文档,发现 opensm 允许这样做。到目前为止,我已经在想要运行软件交换机的 node2 上进行了操作。我可以 ping ib0 地址。现在我需要能够启动软件交换机,但我不知道如何修改这两个文件:
1. /etc/sysconfig/opensm
2. /etc/rdma/opensm.conf
然后我需要了解如何告诉node1 opensm交换机在哪里?
[idf@node2 ~]$ ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.7.200
node_guid: 0025:90ff:ff1a:0070
sys_image_guid: 0025:90ff:ff1a:0073
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: SM_2092000001000
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
[idf@node2 ~]$ sudo service opensm status
Redirecting to /bin/systemctl status opensm.service
opensm.service - Starts the OpenSM InfiniBand fabric Subnet Manager
Loaded: loaded (/usr/lib/systemd/system/opensm.service; enabled)
Active: active (running) since Mon 2015-04-20 20:51:10 EDT; 1h 21min ago
Docs: man:opensm
Process: 842 ExecStart=/usr/libexec/opensm-launch (code=exited, status=0/SUCCESS)
Main PID: 846 (opensm-launch)
CGroup: /system.slice/opensm.service
\u251c\u2500846 /bin/bash /usr/libexec/opensm-launch
\u2514\u2500847 /usr/sbin/opensm
Apr 20 20:51:11 node2.synctrading opensm-launch[842]: Log File: /var/log/opensm.log
Apr 20 20:51:11 node2.synctrading opensm-launch[842]: -------------------------------------------------
Apr 20 20:51:11 node2.synctrading opensm-launch[842]: OpenSM 3.3.18
Apr 20 20:51:11 node2.synctrading OpenSM[847]: /var/log/opensm.log log file opened
Apr 20 20:51:11 node2.synctrading OpenSM[847]: OpenSM 3.3.18
Apr 20 20:51:12 node2.synctrading opensm-launch[842]: Using default GUID 0x2590ffff1a0071
Apr 20 20:51:12 node2.synctrading opensm-launch[842]: Entering DISCOVERING state
Apr 20 20:51:12 node2.synctrading OpenSM[847]: Entering DISCOVERING state
Apr 20 20:51:12 node2.synctrading OpenSM[847]: SM port is down
Apr 20 20:51:12 node2.synctrading opensm-launch[842]: SM port is down
[idf@node2 ~]$
[idf@node2 ~]$ sudo /etc/sysconfig/network-scripts/ifup-ib ib0
[idf@node2 ~]$ ifconfig -a
ib0: flags=4099<UP,BROADCAST,MULTICAST> mtu 2044
inet 192.168.0.1 netmask 255.255.255.0 broadcast 192.168.0.255
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
infiniband 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 0 (Local Loopback)
RX packets 33 bytes 3222 (3.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 33 bytes 3222 (3.1 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[idf@node2 ~]$ ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 1
Firmware version: 2.7.200
Hardware version: b0
Node GUID: 0x002590ffff1a0070
System image GUID: 0x002590ffff1a0073
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0259086a
Port GUID: 0x002590ffff1a0071
Link layer: InfiniBand
[idf@node2 ~]$ sudo ibhosts
Ca : 0x002590ffff1a0070 ports 1 "node2 mlx4_0"