Debian Bookworm 12.5 DRBD 和来自 proxmox 的 ktls-utils 软件包。一直都很好,直到我启用了 tls。
ii drbd-dkms 9.2.8-2 all
ii drbd-reactor 1.4.0-1 amd64
ii drbd-utils 9.27.0-1 amd64
ii ktls-utils 0.10-6 amd64
3 个节点上都有 drbd,全部同步,运行良好。
r0 role:Primary
disk:UpToDate
slave role:Secondary
peer-disk:UpToDate
tiebreaker role:Secondary
peer-disk:UpToDate
{ tls yes; }
然后我在所有这些节点上添加了网络。
收到:
dmesg
[ 147.759827] drbd r0 slave: conn( Connecting -> NetworkFailure ) [disconnected]
[ 147.759848] drbd r0 slave: Terminating sender thread
[ 147.759853] drbd r0 slave: Starting sender thread (from drbd_r_r0 [755])
[ 147.771131] drbd r0 slave: Connection closed
[ 147.771135] drbd r0 slave: helper command: /sbin/drbdadm disconnected
[ 147.771587] drbd r0 slave: helper command: /sbin/drbdadm disconnected exit code 0
[ 147.771592] drbd r0 slave: conn( NetworkFailure -> Unconnected ) [disconnected]
[ 147.977813] drbd r0 tcp:tiebreaker: dtt_send_page: size=80 len=80 sent=-95
[ 147.977817] drbd r0 tiebreaker: conn( Connecting -> NetworkFailure ) [disconnected]
[ 147.977824] drbd r0 tiebreaker: Terminating sender thread
[ 147.977829] drbd r0 tiebreaker: Starting sender thread (from drbd_r_r0 [759])
[ 147.987330] drbd r0 tiebreaker: Connection closed
[ 147.987335] drbd r0 tiebreaker: helper command: /sbin/drbdadm disconnected
[ 147.987744] drbd r0 tiebreaker: helper command: /sbin/drbdadm disconnected exit code 0
[ 147.987749] drbd r0 tiebreaker: conn( NetworkFailure -> Unconnected ) [disconnected]
[ 148.783089] drbd r0 slave: conn( Unconnected -> Connecting ) [connecting]
ngrep -d any port 7788
interface: any
filter: ( port 7788 ) and (ip || ip6)
####
T 192.168.0.X:52605 -> 192.168.0.Y:7788 [AP] #4
....Wf4Q.(.....\.D....x:.0.Y.k 2[..... <SKIPPED MANY LINES>
#####
T 192.168.0.Y:52571 -> 192.168.0.X:7788 [AP] #9
<SKIPPED MANY LINES>
tcpdump port 7788
09:11:21.927899 IP 192.168.0.Z.39381 > 192.168.0.Y.7788: Flags [.], ack 2313, win 501, options [nop,nop,TS val 2232236805 ecr 466841460], length 0
journalctl -f -u tlshd
Apr 10 10:18:12 master tlshd[8385]: Handshake with slave ... was successful
Apr 10 10:18:12 master tlshd[8386]: Handshake with tiebreaker ... was successful
^^ fine on all nodes.
在详细节点上还有以下内容:
DBG<1>././lib/cache_mngt.c:302 nl_cache_mngt_unregister: Unregistered cache operations genl/family
我有如下生成的证书:使用 DRBD 进行加密复制 - LINBIThttps://linbit.com/blog/encrypted-replication-with-drbd/只是修复了 CN 以匹配主机名,还添加了 IP 以防万一(尝试了各种版本,没有运气)。证书是在 macos 上使用 openssl 实用程序生成的。
-addext "subjectAltName = DNS:$domain, IP:$ip" \
-addext "extendedKeyUsage = serverAuth, clientAuth" \
收到:
r0 role:Secondary
disk:UpToDate quorum:no
slave connection:Connecting
tiebreaker connection:NetworkFailure
r0 role:Secondary
disk:UpToDate quorum:no
slave connection:Unconnected
tiebreaker connection:Unconnected
我已禁用 cram-hmac-alg、data-integrity-alg 和 shared-secret,以防万一,只使用“tls yes”来保持网络部分干净,但没有成功。这是一个简单的资源配置,包含 3 个节点和 connection-mesh + tls yes。
我是不是忘记添加什么东西来使一切变得完整?我在本地虚拟环境中运行它,没有网络问题。配置出了点问题,我不知道是什么。
r0.res
resource r0 {
options {
auto-promote no;
on-suspended-primary-outdated force-secondary;
quorum majority;
on-no-quorum io-error;
quorum-minimum-redundancy 2;
}
device /dev/drbd0;
meta-disk internal;
disk /dev/sdb1;
protocol B;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
tls yes;
}
on master {
address 192.168.0.X:7788;
node-id 0;
}
on slave {
address 192.168.0.Y:7788;
node-id 1;
}
on tiebreaker {
address 192.168.0.Z:7788;
node-id 2;
}
connection-mesh {
hosts master slave tiebreaker;
}
}
更多 tlshd 日志:
pr 11 09:37:19 master tlshd[1008]: Parsing a valid netlink message
Apr 11 09:37:19 master tlshd[1008]: No peer identities found
Apr 11 09:37:19 master tlshd[1008]: No certificates found
Apr 11 09:37:19 master tlshd[1008]: DBG<2> ././lib/msg.c:572 nlmsg_free: msg 0x5573416ff750: Freed
Apr 11 09:37:19 master tlshd[1008]: DBG<2> ././lib/msg.c:572 nlmsg_free: msg 0x5573416ff610: Freed
Apr 11 09:37:19 master tlshd[1008]: System config file: /etc/gnutls/config
Apr 11 09:37:19 master tlshd[1008]: Server x.509 truststore is /etc/tlshd.d/myCA.crt
Apr 11 09:37:19 master tlshd[1008]: System trust: Loaded 1 certificate(s).
Apr 11 09:37:19 master tlshd[1008]: Retrieved x.509 certificate from /etc/tlshd.d/master.crt
Apr 11 09:37:19 master tlshd[1008]: Retrieved private key from /etc/tlshd.d/master.key
Apr 11 09:37:19 master tlshd[1008]: gnutls(2): checking 13.02 (GNUTLS_AES_256_GCM_SHA384) for compatibility
Apr 11 09:37:19 master tlshd[1008]: gnutls(2): Selected (RSA) cert
Apr 11 09:37:19 master tlshd[1008]: gnutls(2): EXT[0x557341710040]: server generated SECP384R1 shared key
Apr 11 09:37:19 master tlshd[1009]: Server's trusted authorities:
Apr 11 09:37:19 master tlshd[1009]: [0]: CN=MyCN
Apr 11 09:37:19 master tlshd[1009]: The certificate is trusted.
Apr 11 09:37:19 master tlshd[1009]: The peer offered 1 certificate(s).
Apr 11 09:37:19 master tlshd[1009]: Session description: (TLS1.3)-(ECDHE-SECP384R1)-(RSA-PSS-RSAE-SHA384)-(AES-256-GCM)
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:277 __nlmsg_alloc: msg 0x5573416ff610: Allocated new message, maxlen=4096
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:517 nlmsg_put: msg 0x5573416ff610: Added netlink header type=16, flags=0, pid=0, seq=0
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:424 nlmsg_reserve: msg 0x5573416ff610: Reserved 4 (4) bytes, pad=4, nlmsg_len=20
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/genl/genl.c:357 genlmsg_put: msg 0x5573416ff610: Added generic netlink header cmd=3 version=1
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/attr.c:470 nla_reserve: msg 0x5573416ff610: attr <0x557341703164> 2: Reserved 16 (10) bytes at offset +4 nlmsg_len=36
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/attr.c:507 nla_put: msg 0x5573416ff610: attr <0x557341703164> 2: Wrote 10 bytes at offset +4
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:277 __nlmsg_alloc: msg 0x5573416ff750: Allocated new message, maxlen=164
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:572 nlmsg_free: msg 0x5573416ff750: Freed
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:277 __nlmsg_alloc: msg 0x55734170cac0: Allocated new message, maxlen=36
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:572 nlmsg_free: msg 0x55734170cac0: Freed
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:572 nlmsg_free: msg 0x5573416ff610: Freed
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:277 __nlmsg_alloc: msg 0x5573416ff610: Allocated new message, maxlen=4096
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:517 nlmsg_put: msg 0x5573416ff610: Added netlink header type=32, flags=0, pid=0, seq=0
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:424 nlmsg_reserve: msg 0x5573416ff610: Reserved 4 (4) bytes, pad=4, nlmsg_len=20
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/genl/genl.c:357 genlmsg_put: msg 0x5573416ff610: Added generic netlink header cmd=3 version=0
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/attr.c:470 nla_reserve: msg 0x5573416ff610: attr <0x557341703164> 1: Reserved 8 (4) bytes at offset +4 nlmsg_len=28
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/attr.c:507 nla_put: msg 0x5573416ff610: attr <0x557341703164> 1: Wrote 4 bytes at offset +4
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/attr.c:470 nla_reserve: msg 0x5573416ff610: attr <0x55734170316c> 2: Reserved 8 (4) bytes at offset +12 nlmsg_len=36
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/attr.c:507 nla_put: msg 0x5573416ff610: attr <0x55734170316c> 2: Wrote 4 bytes at offset +12
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/attr.c:470 nla_reserve: msg 0x5573416ff610: attr <0x557341703174> 3: Reserved 8 (4) bytes at offset +20 nlmsg_len=44
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/attr.c:507 nla_put: msg 0x5573416ff610: attr <0x557341703174> 3: Wrote 4 bytes at offset +20
Apr 11 09:37:19 master tlshd[1009]: DBG<2> ././lib/msg.c:572 nlmsg_free: msg 0x5573416ff610: Freed
Apr 11 09:37:19 master tlshd[1009]: Handshake with slave (192.168.0.Y) was successful
Apr 11 09:37:19 master tlshd[1009]: DBG<1>././lib/cache_mngt.c:302 nl_cache_mngt_unregister: Unregistered cache operations genl/family
答案1
好吧,这是非常具有欺骗性的日志消息,tlshd
它导致我认为有问题,drbd
但实际上它是内核版本。我已经从 Debian Backports 存储库安装了 6.6.13,它现在可以正常工作了。