ubuntu 20.04 上的 Postgres 12(复制检查点有错误的魔法 539122744 而不是 307747550)

ubuntu 20.04 上的 Postgres 12(复制检查点有错误的魔法 539122744 而不是 307747550)

在装有 Postgres 12 服务器的 ubuntu 20.04 中,HD 出现问题后,Postgres 停止工作。以下是我的测试和解决问题的尝试:

~$ psql
psql: error: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

当我尝试 systemctl 时:

$ sudo systemctl start postgresql@12-main
Job for [email protected] failed because the service did not take the steps required by its unit configuration.
See "systemctl status [email protected]" and "journalctl -xe" for details.

systemctl status 的输出[电子邮件保护]

$ systemctl status [email protected][email protected] - PostgreSQL Cluster 12-main
     Loaded: loaded (/lib/systemd/system/[email protected]; enabled; vendor preset: enabled)
     Active: failed (Result: protocol) since Wed 2021-01-27 01:58:21 UTC; 1min 0s ago
    Process: 1075 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 12-main start (code=exited, status=1/FAILURE)

Jan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.191 UTC [1096] LOG:  could not remove cache file "global/pg_internal.>
Jan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.191 UTC [1096] PANIC:  replication checkpoint has wrong magic 5391227>
Jan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.424 UTC [1095] LOG:  startup process (PID 1096) was terminated by sig>
Jan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.424 UTC [1095] LOG:  aborting startup due to startup process failure
Jan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.425 UTC [1095] LOG:  database system is shut down
Jan 27 01:58:21 znserver postgresql@12-main[1075]: pg_ctl: could not start server
Jan 27 01:58:21 znserver postgresql@12-main[1075]: Examine the log output.
Jan 27 01:58:21 znserver systemd[1]: [email protected]: Can't open PID file /run/postgresql/12-main.pid (yet?) after start: Operati>
Jan 27 01:58:21 znserver systemd[1]: [email protected]: Failed with result 'protocol'.
Jan 27 01:58:21 znserver systemd[1]: Failed to start PostgreSQL Cluster 12-main.
lines 1-15/15 (END)

使用“服务命令”,我有:

$ sudo service postgresql start
(base) sidon@znserver:~$ sudo service postgresql status
● postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
     Active: active (exited) since Wed 2021-01-27 02:05:24 UTC; 4s ago
    Process: 1246 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 1246 (code=exited, status=0/SUCCESS)

Jan 27 02:05:24 znserver systemd[1]: Starting PostgreSQL RDBMS...
Jan 27 02:05:24 znserver systemd[1]: Finished PostgreSQL RDBMS.

一些帮助?

答案1

经过数小时的研究而无果后,我通过反复试验找到了解决方案:

简短答案

~$ sudo chown postgres.postgres /var/lib/postgresql/12/main/global/pg_internal.init
~$ sudo rm -rf 12/main/global/pg_internal.init
~$ sudo rm -rf /var/lib/postgresql/12/main/pg_logical/replorigin_checkpoint
~$ sudo -i -u postgres
~$ /usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main

长答案:

首先我尝试按照下面的顺序重新启动

~$ sudo -i -u postgres
~$ /usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main

我得到了以下结果:

pg_ctl: PID file "/var/lib/postgresql/12/main/postmaster.pid" does not exist
Is server running?
trying to start server anyway
waiting for server to start....2021-02-13 16:40:12.633 UTC [3806] LOG:  starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-02-13 16:40:12.636 UTC [3806] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-13 16:40:12.636 UTC [3806] LOG:  listening on IPv6 address "::", port 5432
2021-02-13 16:40:12.681 UTC [3806] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-13 16:40:12.824 UTC [3809] LOG:  database system was interrupted; last known up at 2021-01-07 10:56:38 UTC
2021-02-13 16:40:13.138 UTC [3809] LOG:  could not open directory "./global/pg_internal.init": Permission denied
2021-02-13 16:40:13.148 UTC [3809] LOG:  could not remove cache file "global/pg_internal.init": Is a directory
2021-02-13 16:40:13.148 UTC [3809] PANIC:  replication checkpoint has wrong magic 539122744 instead of 307747550
2021-02-13 16:40:13.38After hours and hours of research5 UTC [3806] LOG:  startup process (PID 3809) was terminated by signal 6: Aborted
2021-02-13 16:40:13.385 UTC [3806] LOG:  aborting startup due to startup process failure
2021-02-13 16:40:13.387 UTC [3806] LOG:  database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

经过简单的调查,我发现目录 /var/lib/postgresql/12/main/global/pg_internal.init 的所有者是 root。我更改了所有者:

sudo chown postgres.postgres /var/lib/postgresql/12/main/global/pg_internal.init

然后我又做了一次尝试(第一步):

sudo -i -u postgres

结果略有不同:

/usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main     
pg_ctl: PID file "/var/lib/postgresql/12/main/postmaster.pid" does not exist
Is server running?
trying to start server anyway
waiting for server to start....2021-02-13 16:53:26.132 UTC [4024] LOG:  starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-02-13 16:53:26.132 UTC [4024] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-13 16:53:26.132 UTC [4024] LOG:  listening on IPv6 address "::", port 5432
2021-02-13 16:53:26.171 UTC [4024] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-13 16:53:26.314 UTC [4025] LOG:  database system was interrupted; last known up at 2021-01-07 10:56:38 UTC
2021-02-13 16:53:26.615 UTC [4025] LOG:  could not remove cache file "global/pg_internal.init": Is a directory
2021-02-13 16:53:26.615 UTC [4025] PANIC:  replication checkpoint has wrong magic 539122744 instead of 307747550
2021-02-13 16:53:26.851 UTC [4024] LOG:  startup process (PID 4025) was terminated by signal 6: Aborted
2021-02-13 16:53:26.851 UTC [4024] LOG:  aborting startup due to startup process failure
2021-02-13 16:53:26.852 UTC [4024] LOG:  database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

所以我决定删除文件:12/main/global/pg_internal.init

rm -rf 12/main/global/pg_internal.init

我再次执行了步骤 1

/usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main
pg_ctl: PID file "/var/lib/postgresql/12/main/postmaster.pid" does not exist
Is server running?
trying to start server anyway
waiting for server to start....2021-02-13 17:00:33.310 UTC [4072] LOG:  starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-02-13 17:00:33.310 UTC [4072] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-13 17:00:33.310 UTC [4072] LOG:  listening on IPv6 address "::", port 5432
2021-02-13 17:00:33.348 UTC [4072] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-13 17:00:33.483 UTC [4073] LOG:  database system was interrupted; last known up at 2021-01-07 10:56:38 UTC
2021-02-13 17:00:33.792 UTC [4073] PANIC:  replication checkpoint has wrong magic 539122744 instead of 307747550
2021-02-13 17:00:34.030 UTC [4072] LOG:  startup process (PID 4073) was terminated by signal 6: Aborted
2021-02-13 17:00:34.030 UTC [4072] LOG:  aborting startup due to startup process failure
2021-02-13 17:00:34.031 UTC [4072] LOG:  database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

因此,我删除了文件 /var/lib/postgresql/12/main/pg_logical/replorigin_checkpoint

sudo rm -rf /var/lib/postgresql/12/main/pg_logical/replorigin_checkpoint

我再次执行了步骤 1

/usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main
pg_ctl: PID file "/var/lib/postgresql/12/main/postmaster.pid" does not exist
Is server running?
trying to start server anyway
waiting for server to start....2021-02-13 17:08:02.913 UTC [4186] LOG:  starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-02-13 17:08:02.913 UTC [4186] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-13 17:08:02.913 UTC [4186] LOG:  listening on IPv6 address "::", port 5432
2021-02-13 17:08:02.952 UTC [4186] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-13 17:08:03.103 UTC [4187] LOG:  database system was interrupted; last known up at 2021-01-07 10:56:38 UTC
2021-02-13 17:08:03.412 UTC [4187] LOG:  database system was not properly shut down; automatic recovery in progress
2021-02-13 17:08:03.481 UTC [4187] LOG:  redo starts at 0/2FFEC58
2021-02-13 17:08:03.481 UTC [4187] LOG:  invalid record length at 0/2FFEC90: wanted 24, got 0
2021-02-13 17:08:03.481 UTC [4187] LOG:  redo done at 0/2FFEC58
2021-02-13 17:08:03.683 UTC [4186] LOG:  database system is ready to accept connections
done
server started

好的,这一切都恢复之后,postgres 安装和数据!

答案2

添加@sidon 的回答:如果你遇到权限问题喜欢

... data directory /var/lib/postgresql/data  has invalid permissions
Permissions should be u=rwx (0700) or u=rwx,g=rx (0750)

首先调整data目录权限:

sudo chmod 700 -R /var/lib/postgresql/data

来源:https://github.com/ClusterHQ/dvol/issues/45#issuecomment-370995983

相关内容