使用最新驱动程序/软件时,内核模块无法加载到 NFS/RoCE Ubuntu 16.04

使用最新驱动程序/软件时,内核模块无法加载到 NFS/RoCE Ubuntu 16.04

我在 Ubuntu 16.04 上使用 Mellanox 提供的最新 OFED 包时遇到了 NFS over RoCE 问题MLNX_OFED_LINUX-3.3-1.0.4.0-ubuntu16.04-x86_64.tgz)。我的卡是 Mellanox 10Gbe,并且启用了 RoCE v1。

适用于 Inbox 驱动程序/软件,但不太适用于最新的 OFED

我按照此网站上的文档,使用 Inbox 驱动程序/软件(包含在 Ubuntu 16.04 中),设法使 NFS 与 RoCE 配合使用。我遇到了一些小问题,而且我知道 Ubuntu 的东西已经过时了,所以我想安装最新的 OFED/mlx4 驱动程序等... 按照 mellanox.com 上的建议。所以我这样做了。一切都按计划进行。IP 功能都在那里,RDMA 工具/测试都正常工作。一切似乎都运行良好。除了一件事。

svcrdma 和 xprtrdma 模块将不会加载。因此,对我来说,NFS 不支持 RDMA。我收到以下错误。如果我仅从 Mellanox 网站安装最新的 mlx4 驱动程序,而不安装其他软件包,也会收到同样的错误。

我感觉这个问题可以通过某种方式解决 - 比如通过重新编译内核模块等,但目前我还没搞明白。或者我只是把事情搞砸了(祈祷好运)?有人能帮忙吗?

有人评论在 Mellanox 社区的这篇文章中Ubuntu 14.04 有同样的问题https://community.mellanox.com/docs/DOC-2132 根据同一文档,它应该可以与 CentOS 7 一起正常运行。有什么区别?

我想要的最终结果是让最新的驱动程序和软件(最好)在 Ubuntu 16.04 上使用 NFS over RoCE。如果不是最新的 OFED 软件包,至少也要有最新的 mlx4 驱动程序。我读到过一些文章说较新的内核版本将具有更新的驱动程序和 RDMA 代码(我忘记了大部分内容)。如果这无济于事,我的答案可能是等待较新的 Ubuntu 版本。

谢谢

加载模块时出现错误消息

NFS 服务器:

# modprobe svcrdma 
modprobe: ERROR: could not insert 'rpcrdma': Invalid argument 

dmesg 错误:

[105699.696980] rpcrdma: Unknown symbol rdma_event_msg (err 0) 
[105699.697056] rpcrdma: disagrees about version of symbol ib_create_cq 
[105699.697059] rpcrdma: Unknown symbol ib_create_cq (err -22) 
[105699.697069] rpcrdma: disagrees about version of symbol rdma_resolve_addr 
[105699.697071] rpcrdma: Unknown symbol rdma_resolve_addr (err -22) 
[105699.697183] rpcrdma: Unknown symbol ib_event_msg (err 0) 
[105699.697213] rpcrdma: disagrees about version of symbol ib_dereg_mr 
[105699.697215] rpcrdma: Unknown symbol ib_dereg_mr (err -22) 
[105699.697224] rpcrdma: disagrees about version of symbol ib_query_qp 
[105699.697226] rpcrdma: Unknown symbol ib_query_qp (err -22) 
[105699.697236] rpcrdma: disagrees about version of symbol rdma_disconnect 
[105699.697238] rpcrdma: Unknown symbol rdma_disconnect (err -22) 
[105699.697245] rpcrdma: disagrees about version of symbol ib_alloc_fmr 
[105699.697247] rpcrdma: Unknown symbol ib_alloc_fmr (err -22) 
[105699.697294] rpcrdma: disagrees about version of symbol ib_dealloc_fmr 
[105699.697295] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22) 
[105699.697301] rpcrdma: disagrees about version of symbol rdma_resolve_route 
[105699.697303] rpcrdma: Unknown symbol rdma_resolve_route (err -22) 
[105699.697398] rpcrdma: disagrees about version of symbol rdma_bind_addr 
[105699.697400] rpcrdma: Unknown symbol rdma_bind_addr (err -22) 
[105699.697441] rpcrdma: disagrees about version of symbol rdma_create_qp 
[105699.697443] rpcrdma: Unknown symbol rdma_create_qp (err -22) 
[105699.697479] rpcrdma: Unknown symbol ib_map_mr_sg (err 0) 
[105699.697487] rpcrdma: disagrees about version of symbol ib_destroy_cq 
[105699.697489] rpcrdma: Unknown symbol ib_destroy_cq (err -22) 
[105699.697494] rpcrdma: disagrees about version of symbol rdma_create_id 
[105699.697496] rpcrdma: Unknown symbol rdma_create_id (err -22) 
[105699.697582] rpcrdma: disagrees about version of symbol rdma_listen 
[105699.697584] rpcrdma: Unknown symbol rdma_listen (err -22) 
[105699.697587] rpcrdma: disagrees about version of symbol rdma_destroy_qp 
[105699.697589] rpcrdma: Unknown symbol rdma_destroy_qp (err -22) 
[105699.697597] rpcrdma: disagrees about version of symbol ib_query_device 
[105699.697599] rpcrdma: Unknown symbol ib_query_device (err -22) 
[105699.697606] rpcrdma: disagrees about version of symbol ib_get_dma_mr 
[105699.697607] rpcrdma: Unknown symbol ib_get_dma_mr (err -22) 
[105699.697617] rpcrdma: disagrees about version of symbol ib_alloc_pd 
[105699.697618] rpcrdma: Unknown symbol ib_alloc_pd (err -22) 
[105699.697673] rpcrdma: Unknown symbol ib_alloc_mr (err 0) 
[105699.697734] rpcrdma: disagrees about version of symbol rdma_connect 
[105699.697736] rpcrdma: Unknown symbol rdma_connect (err -22) 
[105699.697769] rpcrdma: Unknown symbol ib_wc_status_msg (err 0) 
[105699.697842] rpcrdma: disagrees about version of symbol rdma_destroy_id 
[105699.697844] rpcrdma: Unknown symbol rdma_destroy_id (err -22) 
[105699.697872] rpcrdma: disagrees about version of symbol rdma_accept 
[105699.697874] rpcrdma: Unknown symbol rdma_accept (err -22) 
[105699.697882] rpcrdma: disagrees about version of symbol ib_destroy_qp 
[105699.697883] rpcrdma: Unknown symbol ib_destroy_qp (err -22) 
[105699.697964] rpcrdma: disagrees about version of symbol ib_dealloc_pd 
[105699.697965] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

NFS 客户端:

# modprobe xprtrdma          
modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

dmesg 错误:

[106055.692454] rpcrdma: Unknown symbol rdma_event_msg (err 0) 
[106055.692480] rpcrdma: disagrees about version of symbol ib_create_cq 
[106055.692481] rpcrdma: Unknown symbol ib_create_cq (err -22) 
[106055.692484] rpcrdma: disagrees about version of symbol rdma_resolve_addr 
[106055.692485] rpcrdma: Unknown symbol rdma_resolve_addr (err -22) 
[106055.692520] rpcrdma: Unknown symbol ib_event_msg (err 0) 
[106055.692529] rpcrdma: disagrees about version of symbol ib_dereg_mr 
[106055.692530] rpcrdma: Unknown symbol ib_dereg_mr (err -22) 
[106055.692532] rpcrdma: disagrees about version of symbol ib_query_qp 
[106055.692533] rpcrdma: Unknown symbol ib_query_qp (err -22) 
[106055.692536] rpcrdma: disagrees about version of symbol rdma_disconnect 
[106055.692536] rpcrdma: Unknown symbol rdma_disconnect (err -22) 
[106055.692538] rpcrdma: disagrees about version of symbol ib_alloc_fmr 
[106055.692539] rpcrdma: Unknown symbol ib_alloc_fmr (err -22) 
[106055.692552] rpcrdma: disagrees about version of symbol ib_dealloc_fmr 
[106055.692553] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22) 
[106055.692554] rpcrdma: disagrees about version of symbol rdma_resolve_route 
[106055.692555] rpcrdma: Unknown symbol rdma_resolve_route (err -22) 
[106055.692565] rpcrdma: disagrees about version of symbol rdma_bind_addr 
[106055.692565] rpcrdma: Unknown symbol rdma_bind_addr (err -22) 
[106055.692573] rpcrdma: disagrees about version of symbol rdma_create_qp 
[106055.692574] rpcrdma: Unknown symbol rdma_create_qp (err -22) 
[106055.692583] rpcrdma: Unknown symbol ib_map_mr_sg (err 0) 
[106055.692585] rpcrdma: disagrees about version of symbol ib_destroy_cq 
[106055.692585] rpcrdma: Unknown symbol ib_destroy_cq (err -22) 
[106055.692587] rpcrdma: disagrees about version of symbol rdma_create_id 
[106055.692587] rpcrdma: Unknown symbol rdma_create_id (err -22) 
[106055.692613] rpcrdma: disagrees about version of symbol rdma_listen 
[106055.692614] rpcrdma: Unknown symbol rdma_listen (err -22) 
[106055.692615] rpcrdma: disagrees about version of symbol rdma_destroy_qp 
[106055.692615] rpcrdma: Unknown symbol rdma_destroy_qp (err -22) 
[106055.692617] rpcrdma: disagrees about version of symbol ib_query_device 
[106055.692618] rpcrdma: Unknown symbol ib_query_device (err -22) 
[106055.692619] rpcrdma: disagrees about version of symbol ib_get_dma_mr 
[106055.692620] rpcrdma: Unknown symbol ib_get_dma_mr (err -22) 
[106055.692622] rpcrdma: disagrees about version of symbol ib_alloc_pd 
[106055.692623] rpcrdma: Unknown symbol ib_alloc_pd (err -22) 
[106055.692638] rpcrdma: Unknown symbol ib_alloc_mr (err 0) 
[106055.692657] rpcrdma: disagrees about version of symbol rdma_connect 
[106055.692658] rpcrdma: Unknown symbol rdma_connect (err -22) 
[106055.692668] rpcrdma: Unknown symbol ib_wc_status_msg (err 0) 
[106055.692690] rpcrdma: disagrees about version of symbol rdma_destroy_id 
[106055.692690] rpcrdma: Unknown symbol rdma_destroy_id (err -22) 
[106055.692698] rpcrdma: disagrees about version of symbol rdma_accept 
[106055.692699] rpcrdma: Unknown symbol rdma_accept (err -22) 
[106055.692701] rpcrdma: disagrees about version of symbol ib_destroy_qp 
[106055.692701] rpcrdma: Unknown symbol ib_destroy_qp (err -22) 
[106055.692724] rpcrdma: disagrees about version of symbol ib_dealloc_pd 
[106055.692725] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

相关内容