由元组并发更新错误引起的 Postgres 崩溃循环

由元组并发更新错误引起的 Postgres 崩溃循环

有时在 OpenShift 中启动 Postgres POD 时会显示以下错误代码

   pg_ctl: another server might be running; trying to start server anyway
   waiting for server to start....LOG:  redirecting log output to logging 
   collector process
   HINT:  Future log output will appear in directory "pg_log".
   ..... done
   server started
   => sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
   ERROR:  tuple concurrently updated

答案1

解决此问题的方法:

  1. 找到处于崩溃循环中的 postgres pod 的名称。
  2. 开始oc debug与 pod 的会话。
  3. 将相关的 Postgres 部署扩展到零个 pod。
  4. 从调试会话的 cmd 行;

    • 运行run-postgresql。这是CMDdocker 镜像的 。作为启动过程的一部分,脚本会创建一些文件,否则这些文件不会存在于 pod 中,即/var/lib/pgsql/openshift-custom-postgresql.conf/var/lib/pgsql/passwd,这将阻止您运行任何pg_ctl命令。运行命令时,您应该会看到上面列出的相同错误输出。
    • 运行pg_ctl stop -D /var/lib/pgsql/data/userdata以彻底关闭 Postgres。您应该看到;

      waiting for server to shut down.... done server stopped

    • 运行pg_ctl start -D /var/lib/pgsql/data/userdata以启动 Postgres。您应该看到以下输出,并且它应该无限期地等待(没有错误);

      server starting sh-4.2$ LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory "pg_log".

    • enter几次即可返回到 cmd 提示符。

    • 运行pg_ctl stop -D /var/lib/pgsql/data/userdata,然后等待 postgres 停止。这将确保干净关闭。

      waiting for server to shut down.... done server stopped

    • 退出调试会话。

    • 将部署规模扩大到 1 个 pod。Postgres 现在应该可以正常启动了。

经过长时间的努力终于找到了解决方案:https://pathfinder-faq-ocio-pathfinder-prod.pathfinder.gov.bc.ca/DB/PostgresqlCrashLoopTupleError.html 致谢作者:Wade Barnes

答案2

您可能希望与最初计划的用户一起创建和运行调试 pod,否则在 pod 内运行命令时您将收到权限被拒绝的信息。

这是我执行的步骤顺序:

oc get -o yaml pod <postgresql-pod> | grep runAsUser
runAsUser: 1000650000

oc scale deployment/<postgresql-d> --replicas=0
deployment.apps/<postgresql-d> scaled

oc debug deployment/<postgresql-d> --as-user=1000650000
Starting pod/<postgresql-debug> ...
Pod IP: 10.128.2.75
If you don't see a command prompt, try pressing enter.

sh-4.2$ run-postgresql
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2021-11-17 09:09:46.428 UTC [25] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-11-17 09:09:46.429 UTC [25] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-11-17 09:09:46.445 UTC [25] LOG:  redirecting log output to logging collector process
2021-11-17 09:09:46.445 UTC [25] HINT:  Future log output will appear in directory "log".
. done
server started
/var/run/postgresql:5432 - accepting connections
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
ERROR:  tuple concurrently updated

sh-4.2$ pg_ctl stop -D /var/lib/pgsql/data/userdata
waiting for server to shut down.... done
server stopped

sh-4.2$ pg_ctl start -D /var/lib/pgsql/data/userdata
waiting for server to start....2021-11-17 09:10:19.359 UTC [45] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-11-17 09:10:19.359 UTC [45] LOG:  listening on IPv6 address "::", port 5432
2021-11-17 09:10:19.369 UTC [45] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-11-17 09:10:19.377 UTC [45] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-11-17 09:10:19.558 UTC [45] LOG:  redirecting log output to logging collector process
2021-11-17 09:10:19.558 UTC [45] HINT:  Future log output will appear in directory "log".
 done
server started
sh-4.2$ 
sh-4.2$ 
sh-4.2$ 

sh-4.2$ pg_ctl stop -D /var/lib/pgsql/data/userdata
waiting for server to shut down.... done
server stopped

sh-4.2$ exit
exit

Removing debug pod ...

oc scale deployment/<postgresql-d> --replicas=1
deployment.apps/<postgresql-d> scaled

相关内容