我们已经设置了 2 个带有 Active Replica 的 keydb 服务器,如下所述:https://docs.keydb.dev/docs/active-rep/
我们使用 HAproxy 将流量重定向到正确的服务器。因此,我们得到了当前的情况:
keydb 001 - 10.0.0.7
keydb 002 - 10.0.0.8
我们希望更新并重启 keydb 01。我们已将其置于 HAproxy 的维护状态,并且所有连接都已耗尽。因此,服务器不再使用,所有实时连接都将转到 keydb 02。
现在,当 keydb 01 再次启动时,它会要求 keydb 02 进行完整的数据库同步。完成此操作后,我们看到 keydb 02 也向 keydb 01 请求完整的数据库同步!!! 这导致 keydb 02 进入 LOADING 状态,而它是 Haproxy 中唯一的活动服务器。
因此,结果就是在很短的时间内,没有 keydb 服务器处于活动状态. 这会导致如下错误:LOADING Redis 正在将数据集加载到内存中
这种主动副本设置的整体理念是,它坚固、安全且可创建高可用性设置。但是,在这种情况下,这意味着每次服务器发生故障时,都会造成短暂的完全停机。这对于我们的设置来说是不可接受的。
我们做错了吗?这是故意为之吗?还是我们必须重新配置某些东西?
我们已经测试了基于磁盘和无磁盘的同步。这没什么区别。我们的配置设置(来自 ansible playbook,因此格式看起来有点奇怪):
bind: "127.0.0.1 {{ my_private_ips[inventory_hostname] }}"
requirepass: "{{ keydb_auth }}"
masterauth: "{{ keydb_auth }}"
replicaof: "xxx 6379"
client-output-buffer-limit:
- normal 0 0 0
- replica 1024mb 256mb 60
- pubsub 32mb 8mb 60
repl-diskless-sync: yes
port: 6379
maxmemory: 3000m
active-replica: yes
我们正在运行 Ubuntu 20.04.2 LTS 和 keydb 版本 6.0.16。
这是来自 keydb 01 的 keydb 日志文件,该文件因维护而关闭并再次启动。
3812319:1439:C 18 Jun 2021 09:08:36.048 * DB saved on disk
3812319:1439:C 18 Jun 2021 09:08:36.054 * RDB: 3 MB of memory used by copy-on-write
958:1439:S 18 Jun 2021 09:08:36.094 * Background saving terminated with success
958:signal-handler (1624000133) Received SIGTERM scheduling shutdown...
958:signal-handler (1624000133) Received SIGTERM scheduling shutdown...
958:1439:S 18 Jun 2021 09:08:53.194 # User requested shutdown...
958:1439:S 18 Jun 2021 09:08:53.194 # systemd supervision requested, but NOTIFY_SOCKET not found
958:1439:S 18 Jun 2021 09:08:53.194 * Saving the final RDB snapshot before exiting.
958:1439:S 18 Jun 2021 09:08:53.194 # systemd supervision requested, but NOTIFY_SOCKET not found
958:1439:S 18 Jun 2021 09:08:54.159 * DB saved on disk
958:1439:S 18 Jun 2021 09:08:54.159 * Removing the pid file.
958:1439:S 18 Jun 2021 09:08:54.159 # KeyDB is now ready to exit, bye bye...
874:874:C 18 Jun 2021 09:09:04.528 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
874:874:C 18 Jun 2021 09:09:04.531 * Notice: "active-replica yes" implies "replica-read-only no"
874:874:C 18 Jun 2021 09:09:04.531 # oO0OoO0OoO0Oo KeyDB is starting oO0OoO0OoO0Oo
874:874:C 18 Jun 2021 09:09:04.531 # KeyDB version=6.0.16, bits=64, commit=00000000, modified=0, pid=874, just started
874:874:C 18 Jun 2021 09:09:04.531 # Configuration loaded
874:874:C 18 Jun 2021 09:09:04.531 # WARNING supervised by systemd - you MUST set appropriate values for TimeoutStartSec and TimeoutStopSec in your service unit.
874:874:C 18 Jun 2021 09:09:04.531 # systemd supervision requested, but NOTIFY_SOCKET not found
KeyDB 6.0.16 (00000000/0) 64 bit
Running in standalone mode
Port: 6379
PID: 957
Join the KeyDB community! https://community.keydb.dev/
957:874:S 18 Jun 2021 09:09:04.781 # Server initialized
957:874:S 18 Jun 2021 09:09:04.781 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
957:874:S 18 Jun 2021 09:09:04.781 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with KeyDB. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. KeyDB must be restarted after THP is disabled.
957:874:S 18 Jun 2021 09:09:04.789 * Loading RDB produced by version 6.0.16
957:874:S 18 Jun 2021 09:09:04.789 * RDB age 11 seconds
957:874:S 18 Jun 2021 09:09:04.789 * RDB memory usage when created 100.16 Mb
957:874:S 18 Jun 2021 09:09:05.563 * DB loaded from disk: 0.778 seconds
957:874:S 18 Jun 2021 09:09:05.563 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
957:874:S 18 Jun 2021 09:09:05.563 # systemd supervision requested, but NOTIFY_SOCKET not found
957:1345:S 18 Jun 2021 09:09:05.563 Thread 1 alive.
957:1344:S 18 Jun 2021 09:09:05.564 Thread 0 alive.
957:1344:S 18 Jun 2021 09:09:05.564 * Connecting to MASTER 10.0.0.8:6379
957:1344:S 18 Jun 2021 09:09:05.564 * MASTER <-> REPLICA sync started
957:1344:S 18 Jun 2021 09:09:05.566 * Non blocking connect for SYNC fired the event.
957:1344:S 18 Jun 2021 09:09:05.567 * Master replied to PING, replication can continue...
957:1344:S 18 Jun 2021 09:09:05.571 * Replica 10.0.0.8:6379 asks for synchronization
957:1344:S 18 Jun 2021 09:09:05.571 * Full resync requested by replica 10.0.0.8:6379
957:1344:S 18 Jun 2021 09:09:05.571 * Replication backlog created, my new replication IDs are '60759691a4adca42cced2fd1895a470ab7b9eebb' and '0000000000000000000000000000000000000000'
957:1344:S 18 Jun 2021 09:09:05.571 * Delay next BGSAVE for diskless SYNC
957:1344:S 18 Jun 2021 09:09:05.573 * Partial resynchronization not possible (no cached master)
957:1344:S 18 Jun 2021 09:09:11.521 * Full resync from master: 3eb1532df27d6815eea8a420c7c72b04e03618dc:516117186158
957:1344:S 18 Jun 2021 09:09:11.521 * Discarding previously cached master state.
957:1344:S 18 Jun 2021 09:09:11.530 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
957:1344:S 18 Jun 2021 09:09:11.603 * Starting BGSAVE for SYNC with target: replicas sockets
957:1344:S 18 Jun 2021 09:09:11.606 * Background RDB transfer started by pid 2412
2412:1344:C 18 Jun 2021 09:09:12.488 * RDB: 1 MB of memory used by copy-on-write
957:1344:S 18 Jun 2021 09:09:12.488 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
957:1344:S 18 Jun 2021 09:09:12.507 * Background RDB transfer terminated with success
957:1344:S 18 Jun 2021 09:09:12.507 * Streamed RDB transfer with replica 10.0.0.8:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
957:1344:S 18 Jun 2021 09:09:12.872 * Synchronization with replica 10.0.0.8:6379 succeeded
957:1344:S 18 Jun 2021 09:10:12.799 # I/O error trying to sync with MASTER: connection lost
957:1344:S 18 Jun 2021 09:10:13.429 * Connecting to MASTER 10.0.0.8:6379
957:1344:S 18 Jun 2021 09:10:13.429 * MASTER <-> REPLICA sync started
957:1344:S 18 Jun 2021 09:10:13.429 * Non blocking connect for SYNC fired the event.
957:1344:S 18 Jun 2021 09:10:13.429 * Master replied to PING, replication can continue...
957:1344:S 18 Jun 2021 09:10:13.430 * Partial resynchronization not possible (no cached master)
957:1344:S 18 Jun 2021 09:10:19.943 * Full resync from master: 349f88c33a6dc9d67a3cb4d623727d9f1047033d:516121643991
957:1344:S 18 Jun 2021 09:10:19.952 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
957:1344:S 18 Jun 2021 09:10:20.856 * MASTER <-> REPLICA sync: Loading DB in memory
957:1344:S 18 Jun 2021 09:10:20.856 * Loading RDB produced by version 6.0.16
957:1344:S 18 Jun 2021 09:10:20.856 * RDB age 1 seconds
957:1344:S 18 Jun 2021 09:10:20.856 * RDB memory usage when created 102.08 Mb
957:1344:S 18 Jun 2021 09:10:21.431 * MASTER <-> REPLICA sync: Finished with success
957:1344:S 18 Jun 2021 09:10:21.431 # systemd supervision requested, but NOTIFY_SOCKET not found
957:1344:S 18 Jun 2021 09:10:21.431 # systemd supervision requested, but NOTIFY_SOCKET not found
957:1344:S 18 Jun 2021 09:11:03.197 * 10000 changes in 60 seconds. Saving...
957:1344:S 18 Jun 2021 09:11:03.200 * Background saving started by pid 4845
4845:1344:C 18 Jun 2021 09:11:04.015 * DB saved on disk
4845:1344:C 18 Jun 2021 09:11:04.018 * RDB: 2 MB of memory used by copy-on-write
957:1344:S 18 Jun 2021 09:11:04.104 * Background saving terminated with success
957:1344:S 18 Jun 2021 09:12:05.071 * 10000 changes in 60 seconds. Saving...
957:1344:S 18 Jun 2021 09:12:05.075 * Background saving started by pid 4870
4870:1344:C 18 Jun 2021 09:12:05.963 * DB saved on disk
这是 keydb 02 日志。此服务器保持“启动”状态,但短暂不可用:
960:1388:S 18 Jun 2021 09:08:29.066 * Background saving started by pid 3814844
3814844:1388:C 18 Jun 2021 09:08:29.896 * DB saved on disk
3814844:1388:C 18 Jun 2021 09:08:29.901 * RDB: 3 MB of memory used by copy-on-write
960:1388:S 18 Jun 2021 09:08:29.974 * Background saving terminated with success
960:1388:S 18 Jun 2021 09:08:54.176 # Connection with master lost.
960:1388:S 18 Jun 2021 09:08:54.176 * Caching the disconnected master state.
960:1388:S 18 Jun 2021 09:08:54.177 # Connection with replica client id #11 lost.
960:1388:S 18 Jun 2021 09:08:54.957 * Connecting to MASTER 10.0.0.7:6379
960:1388:S 18 Jun 2021 09:08:54.957 * MASTER <-> REPLICA sync started
960:1388:S 18 Jun 2021 09:09:02.073 # Error condition on socket for SYNC: Resource temporarily unavailable
960:1388:S 18 Jun 2021 09:09:02.988 * Connecting to MASTER 10.0.0.7:6379
960:1388:S 18 Jun 2021 09:09:02.988 * MASTER <-> REPLICA sync started
960:1388:S 18 Jun 2021 09:09:02.988 # Error condition on socket for SYNC: Operation now in progress
960:1388:S 18 Jun 2021 09:09:03.998 * Connecting to MASTER 10.0.0.7:6379
960:1388:S 18 Jun 2021 09:09:03.998 * MASTER <-> REPLICA sync started
960:1388:S 18 Jun 2021 09:09:03.999 * Non blocking connect for SYNC fired the event.
960:1388:S 18 Jun 2021 09:09:04.068 * Master replied to PING, replication can continue...
960:1388:S 18 Jun 2021 09:09:04.084 * Partial resynchronization not possible (no cached master)
960:1388:S 18 Jun 2021 09:09:04.087 * Replica 10.0.0.7:6379 asks for synchronization
960:1388:S 18 Jun 2021 09:09:04.087 * Full resync requested by replica 10.0.0.7:6379
960:1388:S 18 Jun 2021 09:09:04.087 * Delay next BGSAVE for diskless SYNC
960:1388:S 18 Jun 2021 09:09:10.035 * Starting BGSAVE for SYNC with target: replicas sockets
960:1388:S 18 Jun 2021 09:09:10.042 * Background RDB transfer started by pid 3814859
960:1388:S 18 Jun 2021 09:09:10.117 * Full resync from master: 640f9e48636076058722301cc52ea8a21bc8e450:453896462998
960:1388:S 18 Jun 2021 09:09:10.118 * Discarding previously cached master state.
960:1388:S 18 Jun 2021 09:09:10.122 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
960:1388:S 18 Jun 2021 09:09:10.999 * MASTER <-> REPLICA sync: Loading DB in memory
960:1388:S 18 Jun 2021 09:09:11.000 * Replica is about to load the RDB file received from the master, but there is a pending RDB child running. Killing process 3814859 and removing its temp file to avoid any race
3814859:signal-handler (1624000150) Received SIGUSR1 in child, exiting now.
960:1388:S 18 Jun 2021 09:09:11.000 * Loading RDB produced by version 6.0.16
960:1388:S 18 Jun 2021 09:09:11.000 * RDB age 0 seconds
960:1388:S 18 Jun 2021 09:09:11.000 * RDB memory usage when created 95.02 Mb
960:1388:S 18 Jun 2021 09:09:11.018 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
960:1388:S 18 Jun 2021 09:09:11.019 # Background transfer terminated by signal 10
960:1388:S 18 Jun 2021 09:09:11.019 * Streamed RDB transfer with replica 10.0.0.7:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
960:1388:S 18 Jun 2021 09:09:11.386 * MASTER <-> REPLICA sync: Finished with success
960:1388:S 18 Jun 2021 09:09:11.386 # systemd supervision requested, but NOTIFY_SOCKET not found
960:1388:S 18 Jun 2021 09:09:11.386 # systemd supervision requested, but NOTIFY_SOCKET not found
960:1388:S 18 Jun 2021 09:09:30.096 * 10000 changes in 60 seconds. Saving...
960:1388:S 18 Jun 2021 09:09:30.101 * Background saving started by pid 3814875
3814875:1388:C 18 Jun 2021 09:09:30.970 * DB saved on disk
3814875:1388:C 18 Jun 2021 09:09:30.975 * RDB: 5 MB of memory used by copy-on-write
960:1388:S 18 Jun 2021 09:09:31.022 * Background saving terminated with success
960:1388:S 18 Jun 2021 09:10:12.795 # Disconnecting timedout replica: 10.0.0.7:6379
960:1388:S 18 Jun 2021 09:10:12.795 # Connection with replica 10.0.0.7:6379 lost.
960:1389:S 18 Jun 2021 09:10:13.429 * Replica 10.0.0.7:6379 asks for synchronization
960:1389:S 18 Jun 2021 09:10:13.429 * Full resync requested by replica 10.0.0.7:6379
960:1389:S 18 Jun 2021 09:10:13.429 * Delay next BGSAVE for diskless SYNC
960:1388:S 18 Jun 2021 09:10:19.941 * Starting BGSAVE for SYNC with target: replicas sockets
960:1388:S 18 Jun 2021 09:10:19.948 * Background RDB transfer started by pid 3815442
3815442:1388:C 18 Jun 2021 09:10:20.860 * RDB: 5 MB of memory used by copy-on-write
960:1388:S 18 Jun 2021 09:10:20.860 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
960:1388:S 18 Jun 2021 09:10:20.958 * Background RDB transfer terminated with success
960:1388:S 18 Jun 2021 09:10:20.958 * Streamed RDB transfer with replica 10.0.0.7:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
960:1389:S 18 Jun 2021 09:10:22.034 * Synchronization with replica 10.0.0.7:6379 succeeded
960:1388:S 18 Jun 2021 09:10:32.013 * 10000 changes in 60 seconds. Saving...
960:1388:S 18 Jun 2021 09:10:32.018 * Background saving started by pid 3816278
3816278:1388:C 18 Jun 2021 09:10:32.845 * DB saved on disk
3816278:1388:C 18 Jun 2021 09:10:32.850 * RDB: 4 MB of memory used by copy-on-write
960:1388:S 18 Jun 2021 09:10:32.921 * Background saving terminated with success
960:1388:S 18 Jun 2021 09:11:33.101 * 10000 changes in 60 seconds. Saving...
答案1
也许答案已经太晚了,但我也遇到过类似的问题:config set client-output-buffer-limit "slave 10737418240 10738240 60" 解决了这个问题。对于你的情况,我会添加以下行
- 奴隶 10737418240 10738240 60
我在 stackoverflow 上找到了这个提示:redis-cluster-unavailable-while-switching-slave-to-master。(顺便说一句:在 redis 和 keydb 的代码中替换主从术语不是一个好主意吗?)