Postgres 和 redis pod 时不时会失败

Postgres 和 redis pod 时不时会失败

我在一台 betal ubuntu (20.04) 裸机上使用 RKE 设置了一个 kubernetes 集群 (1.19)。我在该集群上运行一个自托管的 gitlab 实例 (由 helm chart 安装)。

每隔 1-5 小时,一些 pod 会传出去,我看到最多的重启是

cert-manager-cainjector-856d4df858-l4fsz
gitlab-postgresql-0
gitlab-redis-master-0
nfs-provisioner-6c95ddf48-xndwh

我不知道是哪一个导致了问题,哪些是巧合。也许 AOF 错误是主要原因,但也许它只是另一个问题的结果。

描述 postgres

Warning  Unhealthy  4m50s (x16 over 21h)  kubelet  Liveness probe failed:                                                                                                                              │
Warning  Unhealthy  4m50s (x19 over 21h)  kubelet  Readiness probe failed:                                                                                                                             │
Warning  Unhealthy  4m3s (x47 over 15h)   kubelet  Liveness probe failed: 127.0.0.1:5432 - no response                                                                                                 │
Normal   Killing    4m3s (x74 over 15h)   kubelet  Container gitlab-postgresql failed liveness probe, will be restarted                                                                                │
Warning  Unhealthy  3m53s (x74 over 15h)  kubelet  Readiness probe failed: 127.0.0.1:5432 - no response 

日志 postgres

metrics time="2021-10-15T11:07:53Z" level=info msg="Established new database connection to \"127.0.0.1:5432\"." source="postgres_exporter.go:878"
metrics time="2021-10-15T11:07:54Z" level=info msg="Established new database connection to \"127.0.0.1:5432\"." source="postgres_exporter.go:878"
metrics time="2021-10-15T11:07:56Z" level=info msg="Established new database connection to \"127.0.0.1:5432\"." source="postgres_exporter.go:878"
metrics time="2021-10-15T11:07:59Z" level=error msg="Error opening connection to database (postgresql://gitlab:[email protected]:5432/gitlabhq_production?sslmode=disable): pq: the database system is starting up" source="postgres_exporter.go:1474"
metrics time="2021-10-15T11:07:59Z" level=info msg="Starting Server: :9187" source="postgres_exporter.go:1672"

描述 redis

Warning  Unhealthy  5m10s (x44 over 22h)  kubelet  Readiness probe failed:                                                                                                                             │
Warning  Unhealthy  5m (x34 over 15h)     kubelet  Liveness probe failed:                                                                                                                              │
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.                                                                                                        │
 timeout: the monitored command dumped core                                                                                                                                                               │
Warning  Unhealthy  4m45s (x27 over 19h)  kubelet  Liveness probe failed:                                                                                                                              │
Warning  Unhealthy  65s (x2311 over 15h)  kubelet  Readiness probe failed:                                                                                                                             │
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.                                                                                                        │
 timeout: the monitored command dumped core  

记录 redis

useradd: Permission denied.
useradd: cannot lock /etc/passwd; try again later.
chown: invalid user: 'redis'
redis 06:20:45.18 INFO  ==> ** Starting Redis **
1:C 18 Oct 2021 06:20:45.194 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 18 Oct 2021 06:20:45.194 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 18 Oct 2021 06:20:45.194 # Configuration loaded
1:M 18 Oct 2021 06:20:45.197 * Running mode=standalone, port=6379.
1:M 18 Oct 2021 06:20:45.197 # Server initialized
1:M 18 Oct 2021 06:20:45.512 * Reading RDB preamble from AOF file...
1:M 18 Oct 2021 06:20:45.512 * Loading RDB produced by version 6.0.9
1:M 18 Oct 2021 06:20:45.512 * RDB age 694288 seconds
1:M 18 Oct 2021 06:20:45.512 * RDB memory usage when created 3.97 Mb
1:M 18 Oct 2021 06:20:45.512 * RDB has an AOF tail
1:M 18 Oct 2021 06:20:45.515 * Reading the remaining AOF tail...
1:M 18 Oct 2021 06:20:45.735 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:46.641 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:47.169 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:47.569 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:47.649 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:48.151 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:48.697 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:50.452 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:51.337 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:51.479 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:51.846 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:51.847 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:52.078 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:52.247 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:54.645 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:55.877 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:56.234 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:56.400 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:56.991 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.352 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.421 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.547 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.628 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.733 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.781 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.811 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.958 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.059 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.160 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.367 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.475 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.558 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:59.649 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:21:00.050 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:21:00.278 * DB loaded from append only file: 15.080 seconds
1:M 18 Oct 2021 06:21:00.278 * Ready to accept connections

日志 nfs-provisioner

I1016 09:52:48.042599       1 main.go:63] Provisioner quay.io/nfs-provisioner specified                                                                                                                  │
I1016 09:52:48.042763       1 main.go:87] Setting up NFS server!                                                                                                                                         │
I1016 09:53:10.846186       1 server.go:144] starting RLIMIT_NOFILE rlimit.Cur 1048576, rlimit.Max 1048576                                                                                               │
I1016 09:53:10.846225       1 server.go:155] ending RLIMIT_NOFILE rlimit.Cur 1048576, rlimit.Max 1048576                                                                                                 │
I1016 09:53:10.846823       1 server.go:129] Running NFS server!  

我认为有足够的资源:

描述节点

Namespace                   Name                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
---------                   ----                                            ------------  ----------  ---------------  -------------  ---
cert-manager                cert-manager-66b6d6bf59-hnlm7                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d20h
cert-manager                cert-manager-cainjector-856d4df858-l4fsz        0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d20h
cert-manager                cert-manager-webhook-5fd7d458f7-wwq4c           0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d20h
gitlab                      gitlab-gitaly-0                                 100m (1%)     0 (0%)      200Mi (0%)       0 (0%)         84m
gitlab                      gitlab-gitlab-exporter-775f6f9fb-2tcwg          75m (1%)      0 (0%)      100M (0%)        0 (0%)         85m
gitlab                      gitlab-gitlab-pages-79bf887975-xptqx            900m (15%)    0 (0%)      2G (7%)          0 (0%)         80m
gitlab                      gitlab-gitlab-runner-bb6cbb7f-4vbxx             0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h6m
gitlab                      gitlab-gitlab-shell-ccd9bfbbf-8kmzl             50m (0%)      0 (0%)      6M (0%)          0 (0%)         85m
gitlab                      gitlab-gitlab-shell-ccd9bfbbf-phzdf             50m (0%)      0 (0%)      6M (0%)          0 (0%)         85m
gitlab                      gitlab-grafana-app-5758fb55f6-j5rtb             0 (0%)        0 (0%)      0 (0%)           0 (0%)         44m
gitlab                      gitlab-mailroom-54c4b4769f-jj2d8                50m (0%)      0 (0%)      150M (0%)        0 (0%)         85m
gitlab                      gitlab-minio-f69c7f86d-lkz2m                    100m (1%)     0 (0%)      128Mi (0%)       0 (0%)         7h6m
gitlab                      gitlab-postgresql-0                             250m (4%)     0 (0%)      256Mi (1%)       0 (0%)         7h6m
gitlab                      gitlab-prometheus-server-6444c7bd76-62prc       0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h6m
gitlab                      gitlab-redis-master-0                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h6m
gitlab                      gitlab-redis-slave-0                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h6m
gitlab                      gitlab-redis-slave-1                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         7h5m
gitlab                      gitlab-registry-678dccc897-dfdvd                50m (0%)      0 (0%)      32Mi (0%)        0 (0%)         85m
gitlab                      gitlab-registry-678dccc897-lrxcw                50m (0%)      0 (0%)      32Mi (0%)        0 (0%)         85m
gitlab                      gitlab-sidekiq-all-in-1-v1-57479447cb-jr9gr     900m (15%)    0 (0%)      2G (7%)          0 (0%)         78m
gitlab                      gitlab-task-runner-899458786-2x4n2              50m (0%)      0 (0%)      350M (1%)        0 (0%)         78m
gitlab                      gitlab-webservice-default-96484c9f4-khxzm       400m (6%)     0 (0%)      2600M (10%)      0 (0%)         77m
gitlab                      gitlab-webservice-default-96484c9f4-qmblw       400m (6%)     0 (0%)      2600M (10%)      0 (0%)         78m
ingress-nginx               nginx-ingress-controller-gs9dx                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d18h
kube-system                 coredns-685d6d555d-626km                        100m (1%)     0 (0%)      70Mi (0%)        170Mi (0%)     5d21h
kube-system                 coredns-autoscaler-57fd5c9bd5-t7snq             20m (0%)      0 (0%)      10Mi (0%)        0 (0%)         5d21h
kube-system                 metrics-server-7bf4b68b78-xkbn7                 100m (1%)     0 (0%)      200Mi (0%)       0 (0%)         5d21h
kube-system                 nfs-provisioner-6c95ddf48-xndwh                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d20h
kube-system                 weave-net-dsfzf                                 100m (1%)     0 (0%)      0 (0%)           0 (0%)         5d21h

Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource           Requests           Limits
--------           --------           ------
cpu                3745m (62%)        0 (0%)
memory             10785078528 (42%)  170Mi (0%)
ephemeral-storage  0 (0%)             0 (0%)
hugepages-2Mi      0 (0%)             0 (0%)
Events:              <none>

postgres 的详细日志

postgresql 02:54:44.29 
postgresql 02:54:44.29 Welcome to the Bitnami postgresql container
postgresql 02:54:44.29 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
postgresql 02:54:44.30 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues
postgresql 02:54:44.33 
postgresql 02:54:44.34 INFO  ==> ** Starting PostgreSQL setup **
postgresql 02:54:44.37 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 02:54:44.38 INFO  ==> Loading custom pre-init scripts...
postgresql 02:54:44.38 INFO  ==> Loading user's custom files from /docker-entrypoint-preinitdb.d ...
postgresql 02:54:44.39 INFO  ==> Initializing PostgreSQL database...
postgresql 02:54:44.39 INFO  ==> Cleaning stale /bitnami/postgresql/data/postmaster.pid file
postgresql 02:54:44.45 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 02:54:44.45 INFO  ==> Generating local authentication configuration
postgresql 02:54:44.47 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql 02:54:44.48 INFO  ==> Configuring replication parameters
postgresql 02:54:44.54 INFO  ==> Configuring fsync
postgresql 02:54:44.57 INFO  ==> Loading custom scripts...
postgresql 02:54:44.57 INFO  ==> Enabling remote connections
postgresql 02:54:44.59 INFO  ==> ** PostgreSQL setup finished! **
postgresql 02:54:44.64 INFO  ==> ** Starting PostgreSQL **
2021-10-22 02:54:44.670 GMT [1[] LOG:  00000: pgaudit extension initialized
2021-10-22 02:54:44.670 GMT [1[] LOCATION:  _PG_init, pgaudit.c:2017
2021-10-22 02:54:44.670 GMT [1[] LOG:  00000: starting PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-10-22 02:54:44.670 GMT [1[] LOCATION:  PostmasterMain, postmaster.c:1019
2021-10-22 02:54:44.671 GMT [1[] LOG:  00000: listening on IPv4 address "0.0.0.0", port 5432
2021-10-22 02:54:44.671 GMT [1[] LOCATION:  StreamServerPort, pqcomm.c:590
2021-10-22 02:54:44.671 GMT [1[] LOG:  00000: listening on IPv6 address "::", port 5432
2021-10-22 02:54:44.671 GMT [1[] LOCATION:  StreamServerPort, pqcomm.c:590
2021-10-22 02:54:44.674 GMT [1[] LOG:  00000: listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-10-22 02:54:44.674 GMT [1[] LOCATION:  StreamServerPort, pqcomm.c:584
2021-10-22 02:54:44.864 GMT [92[] LOG:  00000: database system was interrupted; last known up at 2021-10-22 02:36:29 GMT
2021-10-22 02:54:44.864 GMT [92[] LOCATION:  StartupXLOG, xlog.c:6305
2021-10-22 02:54:49.253 GMT [92[] LOG:  00000: database system was not properly shut down; automatic recovery in progress
2021-10-22 02:54:49.253 GMT [92[] LOCATION:  StartupXLOG, xlog.c:6808
2021-10-22 02:54:49.303 GMT [92[] LOG:  00000: redo starts at 0/376F9D0
2021-10-22 02:54:49.303 GMT [92[] LOCATION:  StartupXLOG, xlog.c:7083
2021-10-22 02:54:49.304 GMT [92[] LOG:  00000: invalid record length at 0/376FAB8: wanted 24, got 0
2021-10-22 02:54:49.304 GMT [92[] LOCATION:  ReadRecord, xlog.c:4313
2021-10-22 02:54:49.304 GMT [92[] LOG:  00000: redo done at 0/376FA80
2021-10-22 02:54:49.304 GMT [92[] LOCATION:  StartupXLOG, xlog.c:7345
2021-10-22 02:54:49.318 GMT [1[] LOG:  00000: database system is ready to accept connections
2021-10-22 02:54:49.318 GMT [1[] LOCATION:  reaper, postmaster.c:3001

redis日志

useradd: Permission denied.
useradd: cannot lock /etc/passwd; try again later.
chown: invalid user: 'redis'
redis 11:49:42.69 INFO  ==> ** Starting Redis **
1:C 22 Oct 2021 11:49:42.707 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 22 Oct 2021 11:49:42.707 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 22 Oct 2021 11:49:42.707 # Configuration loaded
1:M 22 Oct 2021 11:49:42.709 * Running mode=standalone, port=6379.
1:M 22 Oct 2021 11:49:42.709 # Server initialized
1:M 22 Oct 2021 11:49:42.714 * Reading RDB preamble from AOF file...
1:M 22 Oct 2021 11:49:42.714 * Loading RDB produced by version 6.0.9
1:M 22 Oct 2021 11:49:42.714 * RDB age 6028 seconds
1:M 22 Oct 2021 11:49:42.714 * RDB memory usage when created 3.67 Mb
1:M 22 Oct 2021 11:49:42.714 * RDB has an AOF tail
1:M 22 Oct 2021 11:49:42.717 * Reading the remaining AOF tail...
1:M 22 Oct 2021 11:49:42.869 # !!! Warning: short read while loading the AOF file !!!
1:M 22 Oct 2021 11:49:42.869 # !!! Truncating the AOF at offset 28641046 !!!
1:M 22 Oct 2021 11:49:42.870 # AOF loaded anyway because aof-load-truncated is enabled
1:M 22 Oct 2021 11:49:42.870 * DB loaded from append only file: 0.161 seconds
1:M 22 Oct 2021 11:49:42.870 * Ready to accept connections
1:M 22 Oct 2021 13:59:33.814 * Starting automatic rewriting of AOF on 134% growth
1:M 22 Oct 2021 13:59:33.815 * Background append only file rewriting started by pid 38407
1:M 22 Oct 2021 13:59:33.843 * AOF rewrite child asks to stop sending diffs.
38407:C 22 Oct 2021 13:59:33.843 * Parent agreed to stop sending diffs. Finalizing AOF...
38407:C 22 Oct 2021 13:59:33.843 * Concatenating 0.00 MB of AOF diff received from parent.
38407:C 22 Oct 2021 13:59:33.844 * SYNC append only file rewrite performed
38407:C 22 Oct 2021 13:59:33.845 * AOF rewrite: 1 MB of memory used by copy-on-write
1:M 22 Oct 2021 13:59:33.915 * Background AOF rewrite terminated with success
1:M 22 Oct 2021 13:59:33.916 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1:M 22 Oct 2021 13:59:33.917 * Background AOF rewrite finished successfully

相关内容