Linux 内核近期更新后,IPsec 站点到站点 VPN 出现问题

Linux 内核近期更新后,IPsec 站点到站点 VPN 出现问题

上周末,我们对将站点连接到云环境的 VPN 网关之一进行了自动安全升级。在执行故障排除(通过基本网络故障排除,例如通过 Wireshark)后,我们发现最近的一个安全更新是导致此问题的原因。我们已将系统恢复到已知的良好状态,并暂停了(我们认为)受影响的软件包。

这是 AWS 上的 Ubuntu 20.04 LTS 实例,安装了 linux-image-aws。我们正在使用 IPsec 将多个 EdgeRouter 连接到私有云环境。

升级后,所有站点均能照常连接和通信,例如 ICMP 正在运行,但我们无法访问私有云环境中的某些服务(例如 RDP 或 SMB)。

相关软件包的更改日志没有显示任何明显的关联更改,因此我想知道我是否遗漏了一些基本内容。此配置/设置已经运行良好一年多了,没有任何问题。

已知良好版本:Linux-图像-AWS 5.8.0.1041.43~20.04.13

有问题的版本:linux-image-aws 5.8.0.1042.44~20.04.14 及以后版本(我们也测试了最新的 5.11 版本,似乎也受到了影响)

IPsec 配置摘录

# MAIN IPSEC VPN CONFIG
config setup

conn %default
        keyexchange=ikev1

# <REMOVED>
conn peer-rt1.<REMOVED>.net.au-tunnel-1
        left=%any
        right=rt1.<REMOVED>.net.au
        rightid="%any"
        leftsubnet=172.31.0.0/16
        rightsubnet=10.35.0.0/16
        ike=aes128-sha1-modp2048!
        keyexchange=ikev1
        ikelifetime=28800s
        esp=aes128-sha1-modp2048!
        keylife=3600s
        rekeymargin=540s
        type=tunnel
        compress=no
        authby=secret
        auto=route
        keyingtries=%forever
        dpddelay=30s
        dpdtimeout=120s
        dpdaction=restart

先感谢您。

编辑 1:在另一次测试升级之后,我能够从不成功的 RDP 连接中捕获另一个 tcpdump,如下所示:

21:43:01.813502 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [S], seq 2706968963, win 64954, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
21:43:01.813596 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [S], seq 2706968963, win 64954, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
21:43:01.814238 IP <REMOTE>.3389 > <LOCAL>.51099: Flags [S.], seq 152885333, ack 2706968964, win 64000, options [mss 1460,nop,wscale 0,nop,nop,sackOK], length 0
21:43:01.839105 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [.], ack 1, win 1025, length 0
21:43:01.839168 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [.], ack 1, win 1025, length 0
21:43:01.840486 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [P.], seq 1:48, ack 1, win 1025, length 47
21:43:01.840541 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [P.], seq 1:48, ack 1, win 1025, length 47
21:43:01.843746 IP <REMOTE>.3389 > <LOCAL>.51099: Flags [P.], seq 1:20, ack 48, win 63953, length 19
21:43:01.922120 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [.], ack 20, win 1025, length 0
21:43:01.922212 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [.], ack 20, win 1025, length 0
21:43:01.932646 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [P.], seq 48:226, ack 20, win 1025, length 178
21:43:01.932729 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [P.], seq 48:226, ack 20, win 1025, length 178
21:43:01.940677 IP <REMOTE>.3389 > <LOCAL>.51099: Flags [P.], seq 20:1217, ack 226, win 63775, length 1197
21:43:01.967343 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [P.], seq 226:408, ack 1217, win 1020, length 182
21:43:01.967417 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [P.], seq 226:408, ack 1217, win 1020, length 182
21:43:01.969452 IP <REMOTE>.3389 > <LOCAL>.51099: Flags [P.], seq 1217:1324, ack 408, win 63593, length 107
21:43:02.044376 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [.], ack 1324, win 1020, length 0
21:43:02.044471 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [.], ack 1324, win 1020, length 0
21:43:02.135594 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [P.], seq 408:637, ack 1324, win 1020, length 229
21:43:02.135653 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [P.], seq 408:637, ack 1324, win 1020, length 229
21:43:02.136796 IP <REMOTE>.3389 > <LOCAL>.51099: Flags [P.], seq 1324:2609, ack 637, win 63364, length 1285
21:43:02.212871 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [.], ack 2609, win 1025, length 0
21:43:02.212940 IP <LOCAL>.51099 > <REMOTE>.3389: Flags [.], ack 2609, win 1025, length 0

答案1

无需深究 - 是否可能删除旧的、弱密码,如 sha1 和 aes128?尝试在工作内核版本中将其更改为 aes256-sha256-modp2048,然后升级以查看是否仍然中断。也许 ikev2 而不是 ikev1?但这是用户空间问题,而不是内核设置。

尝试一下...

相关内容