我在我的集群上本地安装了两个 openmpi 版本:
openmpi-1.8.1:当我在此版本下运行 mpirun 时,它给出了一个错误:
librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device librdmacm: Fatal: unable to open RDMA device
openmpi-2.0.0:当我在此版本下运行 mpirun 时,它告诉我:
mca_base_component_repository_open: shmem "/opt/openmpi-1.8.1/lib/openmpi/mca_shmem_posix" uses an MCA interface that is not recognized (component MCA v2.0.0 != supported MCA v2.1.0) -- ignored It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS
两个版本都安装在 /opt 本地并作为模块加载。看来 openmpi-2.0.0 仍然使用 openmpi-1.8.1 的依赖项,我不明白。
我将非常感激任何诊断和/或解决问题的提示。
提前致谢。
答案1
您不应该将它们安装在同一个位置,因为它们的库会相互冲突。openmpi 2.0.0 使用 MCA 接口 2.1.0,openmpi 1.8.1 使用 MCA 接口 2.0.0。因此,您应该将库安装在不同的位置