使用 patroni 实现高可用性 PostgreSQL

使用 patroni 实现高可用性 PostgreSQL

我按照本指南为 postgres HA 设置了一个实验室

https://www.alibabacloud.com/blog/how-to-set-up-a-highly-available-postgresql-cluster-using-patroni-on-ubuntu-16-04_594477

我严格按照指南操作(在我的情况下更改了 IP 地址),毕竟一切在 postgres 服务器 1 上都能正常工作

patroni.yml但是当设置postgres 服务器 2 时

该指南说两个 postgres 服务器都有相同的patroni.yml设置,但是当重新启动patroni service

此问题发生在 server1

quanlm@DB1:~$ sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-11-12 07:35:33 UTC; 14min ago
 Main PID: 411 (patroni)
    Tasks: 12
   Memory: 77.6M
      CPU: 4.041s
   CGroup: /system.slice/patroni.service
           ├─411 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
           ├─431 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --listen_addresses=192.168.122.77 --max_prepared_tran
           ├─435 postgres: postgres: checkpointer process                                                                                   
           ├─436 postgres: postgres: writer process                                                                                         
           ├─439 postgres: postgres: stats collector process                                                                                
           ├─447 postgres: postgres: postgres postgres 192.168.122.77(49984) idle                                                           
           ├─455 postgres: postgres: wal writer process                                                                                     
           └─456 postgres: postgres: autovacuum launcher process                                                                            

Nov 12 07:49:28 DB1 patroni[411]: 2019-11-12 07:49:28,533 INFO: no action.  i am the leader with the lock
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,536 INFO: no action.  i am the leader with the lock
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,544 INFO: no action.  i am the leader with the lock
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,458 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,548 INFO: no action.  i am the leader with the lock
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,457 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,539 INFO: no action.  i am the leader with the lock
Nov 12 07:50:19 DB1 patroni[411]: 2019-11-12 07:50:19,949 INFO: acquired session lock as a leader

是的,服务器 1 很好,但是服务器 2

quanlm@DB2:~$ sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-11-12 07:50:02 UTC; 2s ago
  Process: 9514 ExecStart=/usr/local/bin/patroni /etc/patroni.yml (code=exited, status=1/FAILURE)
 Main PID: 9514 (code=exited, status=1/FAILURE)

Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/socketserver.py", line 440, in __init__
Nov 12 07:50:02 DB2 patroni[9514]:     self.server_bind()
Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/http/server.py", line 138, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]:     socketserver.TCPServer.server_bind(self)
Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/socketserver.py", line 454, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]:     self.socket.bind(self.server_address)
Nov 12 07:50:02 DB2 patroni[9514]: OSError: [Errno 99] Cannot assign requested address
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Main process exited, code=exited, status=1/FAILURE
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Unit entered failed state.
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Failed with result 'exit-code'.

它终究还是不起作用。

listen_addresses = '*'我已通过postgresql.conf编辑允许两个服务器进行远程连接

host all all 0.0.0.0/0 md5

pg_hba.conf

因此,当 HAproxy 开始工作时,如果第一台服务器宕机,第二台服务器却无法启动。

问题肯定出在patroni服务器 2 上,但如何解决它呢?

否则,有什么办法可以解决 HA postgresql 服务器的问题吗?

P/s:防火墙设置

quanlm@DB1:~$ sudo ufw status
Status: inactive
quanlm@DB1:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination  
quanlm@DB2:~$ sudo ufw status
Status: inactive
quanlm@DB2:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination  

答案1

从日志来看您需要在第二台服务器上执行:

# systemctl enable patroni.service
# systemctl start patroni.service

我推断这是因为错误消息表明它已被禁用。(我经常使用 Postgres,但还没有尝试过 patroni。)

答案2

我从未使用过 Patroni,但日志中关于错误的描述非常清楚:

OSError: [Errno 99] Cannot assign requested address

不确定所涉及的架构,但如果您配置了某种客户端用来连接的“浮动 IP”,那么此 IP 只会分配给您的一个节点。如果您希望 Patroni 也在第二个节点上使用此地址,解决此问题的一种方法是允许绑定到“非本地”IP(默认情况下禁用,导致上述错误)。

为了实现这一点,设置

net.ipv4.ip_nonlocal_bind = 1

通过sysctl

(请注意:正如所写的,我从未使用过 Patroni。您必须自己检查我所写的内容是否有意义并且适合您的环境。)

相关内容