我按照本指南为 postgres HA 设置了一个实验室
我严格按照指南操作(在我的情况下更改了 IP 地址),毕竟一切在 postgres 服务器 1 上都能正常工作
patroni.yml
但是当设置postgres 服务器 2 时
该指南说两个 postgres 服务器都有相同的patroni.yml
设置,但是当重新启动patroni service
此问题发生在 server1
quanlm@DB1:~$ sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2019-11-12 07:35:33 UTC; 14min ago
Main PID: 411 (patroni)
Tasks: 12
Memory: 77.6M
CPU: 4.041s
CGroup: /system.slice/patroni.service
├─411 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
├─431 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --listen_addresses=192.168.122.77 --max_prepared_tran
├─435 postgres: postgres: checkpointer process
├─436 postgres: postgres: writer process
├─439 postgres: postgres: stats collector process
├─447 postgres: postgres: postgres postgres 192.168.122.77(49984) idle
├─455 postgres: postgres: wal writer process
└─456 postgres: postgres: autovacuum launcher process
Nov 12 07:49:28 DB1 patroni[411]: 2019-11-12 07:49:28,533 INFO: no action. i am the leader with the lock
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,536 INFO: no action. i am the leader with the lock
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,544 INFO: no action. i am the leader with the lock
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,458 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,548 INFO: no action. i am the leader with the lock
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,457 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,539 INFO: no action. i am the leader with the lock
Nov 12 07:50:19 DB1 patroni[411]: 2019-11-12 07:50:19,949 INFO: acquired session lock as a leader
是的,服务器 1 很好,但是服务器 2
quanlm@DB2:~$ sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2019-11-12 07:50:02 UTC; 2s ago
Process: 9514 ExecStart=/usr/local/bin/patroni /etc/patroni.yml (code=exited, status=1/FAILURE)
Main PID: 9514 (code=exited, status=1/FAILURE)
Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/socketserver.py", line 440, in __init__
Nov 12 07:50:02 DB2 patroni[9514]: self.server_bind()
Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/http/server.py", line 138, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]: socketserver.TCPServer.server_bind(self)
Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/socketserver.py", line 454, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]: self.socket.bind(self.server_address)
Nov 12 07:50:02 DB2 patroni[9514]: OSError: [Errno 99] Cannot assign requested address
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Main process exited, code=exited, status=1/FAILURE
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Unit entered failed state.
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Failed with result 'exit-code'.
它终究还是不起作用。
listen_addresses = '*'
我已通过postgresql.conf
编辑允许两个服务器进行远程连接
host all all 0.0.0.0/0 md5
在pg_hba.conf
因此,当 HAproxy 开始工作时,如果第一台服务器宕机,第二台服务器却无法启动。
问题肯定出在patroni
服务器 2 上,但如何解决它呢?
否则,有什么办法可以解决 HA postgresql 服务器的问题吗?
P/s:防火墙设置
quanlm@DB1:~$ sudo ufw status
Status: inactive
quanlm@DB1:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
quanlm@DB2:~$ sudo ufw status
Status: inactive
quanlm@DB2:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
答案1
从日志来看您需要在第二台服务器上执行:
# systemctl enable patroni.service
# systemctl start patroni.service
我推断这是因为错误消息表明它已被禁用。(我经常使用 Postgres,但还没有尝试过 patroni。)
答案2
我从未使用过 Patroni,但日志中关于错误的描述非常清楚:
OSError: [Errno 99] Cannot assign requested address
不确定所涉及的架构,但如果您配置了某种客户端用来连接的“浮动 IP”,那么此 IP 只会分配给您的一个节点。如果您希望 Patroni 也在第二个节点上使用此地址,解决此问题的一种方法是允许绑定到“非本地”IP(默认情况下禁用,导致上述错误)。
为了实现这一点,设置
net.ipv4.ip_nonlocal_bind = 1
通过sysctl
。
(请注意:正如所写的,我从未使用过 Patroni。您必须自己检查我所写的内容是否有意义并且适合您的环境。)