我刚刚安装了 intel oneApi,并且 intel MPI 已包含在其中。我现在可以使用 编译我的 fortran 代码mpiifort
,但是当我尝试使用 intel 运行它时mpirun
,仅使用 `` 它会抛出此错误。
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
[1628187174.213489] [localhost:38060:0] select.c:406 UCX ERROR no active messages transport to <no debug data>: self/self - Destination is unreachable, rdmacm/sockaddr - no am bcopy
[1628187174.213511] [localhost:38061:0] select.c:406 UCX ERROR no active messages transport to <no debug data>: self/self - Destination is unreachable, rdmacm/sockaddr - no am bcopy
Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1169)..............:
MPIDI_OFI_mpi_init_hook(1909): OFI get address vector map failed
Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1169)..............:
MPIDI_OFI_mpi_init_hook(1909): OFI get address vector map failed
现在,按照这里的建议https://github.com/openucx/ucx/issues/4742#issuecomment-584059909,我设置了这些环境变量export UCX_TLS=ud,sm,self
,现在可执行文件运行但也抛出此错误,
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
[1628186868.761953] [localhost:35174:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1628186868.793554] [localhost:35173:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
输出为ipcs -l
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398442373116
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
我不明白问题是什么以及如何解决这个问题。有人可以帮我吗?