mpirun (openmpi) 的问题

mpirun (openmpi) 的问题

我在我的集​​群上本地安装了两个 openmpi 版本:

  • openmpi-1.8.1:当我在此版本下运行 mpirun 时,它给出了一个错误:

    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    librdmacm: Fatal: unable to open RDMA device
    
  • openmpi-2.0.0:当我在此版本下运行 mpirun 时,它告诉我:

    mca_base_component_repository_open: shmem "/opt/openmpi-1.8.1/lib/openmpi/mca_shmem_posix" uses an MCA interface that is not recognized (component MCA v2.0.0 != supported MCA v2.1.0) -- ignored
    
    It looks like opal_init failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during opal_init; some of which are due to configuration or
    environment problems.  This failure appears to be an internal failure;
    here's some additional information (which may only be relevant to an
    Open MPI developer):
      opal_shmem_base_select failed
      --> Returned value -1 instead of OPAL_SUCCESS
    

两个版本都安装在 /opt 本地并作为模块加载。看来 openmpi-2.0.0 仍然使用 openmpi-1.8.1 的依赖项,我不明白。

我将非常感激任何诊断和/或解决问题的提示。

提前致谢。

答案1

您不应该将它们安装在同一个位置,因为它们的库会相互冲突。openmpi 2.0.0 使用 MCA 接口 2.1.0,openmpi 1.8.1 使用 MCA 接口 2.0.0。因此,您应该将库安装在不同的位置

相关内容