我在一台 betal ubuntu (20.04) 裸机上使用 RKE 设置了一个 kubernetes 集群 (1.19)。我在该集群上运行一个自托管的 gitlab 实例 (由 helm chart 安装)。
每隔 1-5 小时,一些 pod 会传出去,我看到最多的重启是
cert-manager-cainjector-856d4df858-l4fsz
gitlab-postgresql-0
gitlab-redis-master-0
nfs-provisioner-6c95ddf48-xndwh
我不知道是哪一个导致了问题,哪些是巧合。也许 AOF 错误是主要原因,但也许它只是另一个问题的结果。
描述 postgres
Warning Unhealthy 4m50s (x16 over 21h) kubelet Liveness probe failed: │
Warning Unhealthy 4m50s (x19 over 21h) kubelet Readiness probe failed: │
Warning Unhealthy 4m3s (x47 over 15h) kubelet Liveness probe failed: 127.0.0.1:5432 - no response │
Normal Killing 4m3s (x74 over 15h) kubelet Container gitlab-postgresql failed liveness probe, will be restarted │
Warning Unhealthy 3m53s (x74 over 15h) kubelet Readiness probe failed: 127.0.0.1:5432 - no response
日志 postgres
metrics time="2021-10-15T11:07:53Z" level=info msg="Established new database connection to \"127.0.0.1:5432\"." source="postgres_exporter.go:878"
metrics time="2021-10-15T11:07:54Z" level=info msg="Established new database connection to \"127.0.0.1:5432\"." source="postgres_exporter.go:878"
metrics time="2021-10-15T11:07:56Z" level=info msg="Established new database connection to \"127.0.0.1:5432\"." source="postgres_exporter.go:878"
metrics time="2021-10-15T11:07:59Z" level=error msg="Error opening connection to database (postgresql://gitlab:[email protected]:5432/gitlabhq_production?sslmode=disable): pq: the database system is starting up" source="postgres_exporter.go:1474"
metrics time="2021-10-15T11:07:59Z" level=info msg="Starting Server: :9187" source="postgres_exporter.go:1672"
描述 redis
Warning Unhealthy 5m10s (x44 over 22h) kubelet Readiness probe failed: │
Warning Unhealthy 5m (x34 over 15h) kubelet Liveness probe failed: │
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. │
timeout: the monitored command dumped core │
Warning Unhealthy 4m45s (x27 over 19h) kubelet Liveness probe failed: │
Warning Unhealthy 65s (x2311 over 15h) kubelet Readiness probe failed: │
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. │
timeout: the monitored command dumped core
记录 redis
useradd: Permission denied.
useradd: cannot lock /etc/passwd; try again later.
chown: invalid user: 'redis'
redis 06:20:45.18 INFO ==> ** Starting Redis **
1:C 18 Oct 2021 06:20:45.194 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 18 Oct 2021 06:20:45.194 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 18 Oct 2021 06:20:45.194 # Configuration loaded
1:M 18 Oct 2021 06:20:45.197 * Running mode=standalone, port=6379.
1:M 18 Oct 2021 06:20:45.197 # Server initialized
1:M 18 Oct 2021 06:20:45.512 * Reading RDB preamble from AOF file...
1:M 18 Oct 2021 06:20:45.512 * Loading RDB produced by version 6.0.9
1:M 18 Oct 2021 06:20:45.512 * RDB age 694288 seconds
1:M 18 Oct 2021 06:20:45.512 * RDB memory usage when created 3.97 Mb
1:M 18 Oct 2021 06:20:45.512 * RDB has an AOF tail
1:M 18 Oct 2021 06:20:45.515 * Reading the remaining AOF tail...
1:M 18 Oct 2021 06:20:45.735 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:46.641 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:47.169 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:47.569 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:47.649 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:48.151 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:48.697 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:50.452 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:51.337 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:51.479 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:51.846 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:51.847 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:52.078 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:52.247 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:54.645 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:55.877 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:56.234 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:56.400 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:56.991 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.352 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.421 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.547 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.628 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.733 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.781 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.811 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:57.958 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.059 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.160 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.367 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.475 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:58.558 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:20:59.649 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:21:00.050 # == CRITICAL == This server is sending an error to its AOF-loading-client: 'MULTI calls can not be nested' after processing the command 'exec'
1:M 18 Oct 2021 06:21:00.278 * DB loaded from append only file: 15.080 seconds
1:M 18 Oct 2021 06:21:00.278 * Ready to accept connections
日志 nfs-provisioner
I1016 09:52:48.042599 1 main.go:63] Provisioner quay.io/nfs-provisioner specified │
I1016 09:52:48.042763 1 main.go:87] Setting up NFS server! │
I1016 09:53:10.846186 1 server.go:144] starting RLIMIT_NOFILE rlimit.Cur 1048576, rlimit.Max 1048576 │
I1016 09:53:10.846225 1 server.go:155] ending RLIMIT_NOFILE rlimit.Cur 1048576, rlimit.Max 1048576 │
I1016 09:53:10.846823 1 server.go:129] Running NFS server!
我认为有足够的资源:
描述节点
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
cert-manager cert-manager-66b6d6bf59-hnlm7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d20h
cert-manager cert-manager-cainjector-856d4df858-l4fsz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d20h
cert-manager cert-manager-webhook-5fd7d458f7-wwq4c 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d20h
gitlab gitlab-gitaly-0 100m (1%) 0 (0%) 200Mi (0%) 0 (0%) 84m
gitlab gitlab-gitlab-exporter-775f6f9fb-2tcwg 75m (1%) 0 (0%) 100M (0%) 0 (0%) 85m
gitlab gitlab-gitlab-pages-79bf887975-xptqx 900m (15%) 0 (0%) 2G (7%) 0 (0%) 80m
gitlab gitlab-gitlab-runner-bb6cbb7f-4vbxx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h6m
gitlab gitlab-gitlab-shell-ccd9bfbbf-8kmzl 50m (0%) 0 (0%) 6M (0%) 0 (0%) 85m
gitlab gitlab-gitlab-shell-ccd9bfbbf-phzdf 50m (0%) 0 (0%) 6M (0%) 0 (0%) 85m
gitlab gitlab-grafana-app-5758fb55f6-j5rtb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 44m
gitlab gitlab-mailroom-54c4b4769f-jj2d8 50m (0%) 0 (0%) 150M (0%) 0 (0%) 85m
gitlab gitlab-minio-f69c7f86d-lkz2m 100m (1%) 0 (0%) 128Mi (0%) 0 (0%) 7h6m
gitlab gitlab-postgresql-0 250m (4%) 0 (0%) 256Mi (1%) 0 (0%) 7h6m
gitlab gitlab-prometheus-server-6444c7bd76-62prc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h6m
gitlab gitlab-redis-master-0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h6m
gitlab gitlab-redis-slave-0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h6m
gitlab gitlab-redis-slave-1 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7h5m
gitlab gitlab-registry-678dccc897-dfdvd 50m (0%) 0 (0%) 32Mi (0%) 0 (0%) 85m
gitlab gitlab-registry-678dccc897-lrxcw 50m (0%) 0 (0%) 32Mi (0%) 0 (0%) 85m
gitlab gitlab-sidekiq-all-in-1-v1-57479447cb-jr9gr 900m (15%) 0 (0%) 2G (7%) 0 (0%) 78m
gitlab gitlab-task-runner-899458786-2x4n2 50m (0%) 0 (0%) 350M (1%) 0 (0%) 78m
gitlab gitlab-webservice-default-96484c9f4-khxzm 400m (6%) 0 (0%) 2600M (10%) 0 (0%) 77m
gitlab gitlab-webservice-default-96484c9f4-qmblw 400m (6%) 0 (0%) 2600M (10%) 0 (0%) 78m
ingress-nginx nginx-ingress-controller-gs9dx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d18h
kube-system coredns-685d6d555d-626km 100m (1%) 0 (0%) 70Mi (0%) 170Mi (0%) 5d21h
kube-system coredns-autoscaler-57fd5c9bd5-t7snq 20m (0%) 0 (0%) 10Mi (0%) 0 (0%) 5d21h
kube-system metrics-server-7bf4b68b78-xkbn7 100m (1%) 0 (0%) 200Mi (0%) 0 (0%) 5d21h
kube-system nfs-provisioner-6c95ddf48-xndwh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d20h
kube-system weave-net-dsfzf 100m (1%) 0 (0%) 0 (0%) 0 (0%) 5d21h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 3745m (62%) 0 (0%)
memory 10785078528 (42%) 170Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
postgres 的详细日志
postgresql 02:54:44.29
postgresql 02:54:44.29 Welcome to the Bitnami postgresql container
postgresql 02:54:44.29 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
postgresql 02:54:44.30 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues
postgresql 02:54:44.33
postgresql 02:54:44.34 INFO ==> ** Starting PostgreSQL setup **
postgresql 02:54:44.37 INFO ==> Validating settings in POSTGRESQL_* env vars..
postgresql 02:54:44.38 INFO ==> Loading custom pre-init scripts...
postgresql 02:54:44.38 INFO ==> Loading user's custom files from /docker-entrypoint-preinitdb.d ...
postgresql 02:54:44.39 INFO ==> Initializing PostgreSQL database...
postgresql 02:54:44.39 INFO ==> Cleaning stale /bitnami/postgresql/data/postmaster.pid file
postgresql 02:54:44.45 INFO ==> pg_hba.conf file not detected. Generating it...
postgresql 02:54:44.45 INFO ==> Generating local authentication configuration
postgresql 02:54:44.47 INFO ==> Deploying PostgreSQL with persisted data...
postgresql 02:54:44.48 INFO ==> Configuring replication parameters
postgresql 02:54:44.54 INFO ==> Configuring fsync
postgresql 02:54:44.57 INFO ==> Loading custom scripts...
postgresql 02:54:44.57 INFO ==> Enabling remote connections
postgresql 02:54:44.59 INFO ==> ** PostgreSQL setup finished! **
postgresql 02:54:44.64 INFO ==> ** Starting PostgreSQL **
2021-10-22 02:54:44.670 GMT [1[] LOG: 00000: pgaudit extension initialized
2021-10-22 02:54:44.670 GMT [1[] LOCATION: _PG_init, pgaudit.c:2017
2021-10-22 02:54:44.670 GMT [1[] LOG: 00000: starting PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-10-22 02:54:44.670 GMT [1[] LOCATION: PostmasterMain, postmaster.c:1019
2021-10-22 02:54:44.671 GMT [1[] LOG: 00000: listening on IPv4 address "0.0.0.0", port 5432
2021-10-22 02:54:44.671 GMT [1[] LOCATION: StreamServerPort, pqcomm.c:590
2021-10-22 02:54:44.671 GMT [1[] LOG: 00000: listening on IPv6 address "::", port 5432
2021-10-22 02:54:44.671 GMT [1[] LOCATION: StreamServerPort, pqcomm.c:590
2021-10-22 02:54:44.674 GMT [1[] LOG: 00000: listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-10-22 02:54:44.674 GMT [1[] LOCATION: StreamServerPort, pqcomm.c:584
2021-10-22 02:54:44.864 GMT [92[] LOG: 00000: database system was interrupted; last known up at 2021-10-22 02:36:29 GMT
2021-10-22 02:54:44.864 GMT [92[] LOCATION: StartupXLOG, xlog.c:6305
2021-10-22 02:54:49.253 GMT [92[] LOG: 00000: database system was not properly shut down; automatic recovery in progress
2021-10-22 02:54:49.253 GMT [92[] LOCATION: StartupXLOG, xlog.c:6808
2021-10-22 02:54:49.303 GMT [92[] LOG: 00000: redo starts at 0/376F9D0
2021-10-22 02:54:49.303 GMT [92[] LOCATION: StartupXLOG, xlog.c:7083
2021-10-22 02:54:49.304 GMT [92[] LOG: 00000: invalid record length at 0/376FAB8: wanted 24, got 0
2021-10-22 02:54:49.304 GMT [92[] LOCATION: ReadRecord, xlog.c:4313
2021-10-22 02:54:49.304 GMT [92[] LOG: 00000: redo done at 0/376FA80
2021-10-22 02:54:49.304 GMT [92[] LOCATION: StartupXLOG, xlog.c:7345
2021-10-22 02:54:49.318 GMT [1[] LOG: 00000: database system is ready to accept connections
2021-10-22 02:54:49.318 GMT [1[] LOCATION: reaper, postmaster.c:3001
redis日志
useradd: Permission denied.
useradd: cannot lock /etc/passwd; try again later.
chown: invalid user: 'redis'
redis 11:49:42.69 INFO ==> ** Starting Redis **
1:C 22 Oct 2021 11:49:42.707 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 22 Oct 2021 11:49:42.707 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 22 Oct 2021 11:49:42.707 # Configuration loaded
1:M 22 Oct 2021 11:49:42.709 * Running mode=standalone, port=6379.
1:M 22 Oct 2021 11:49:42.709 # Server initialized
1:M 22 Oct 2021 11:49:42.714 * Reading RDB preamble from AOF file...
1:M 22 Oct 2021 11:49:42.714 * Loading RDB produced by version 6.0.9
1:M 22 Oct 2021 11:49:42.714 * RDB age 6028 seconds
1:M 22 Oct 2021 11:49:42.714 * RDB memory usage when created 3.67 Mb
1:M 22 Oct 2021 11:49:42.714 * RDB has an AOF tail
1:M 22 Oct 2021 11:49:42.717 * Reading the remaining AOF tail...
1:M 22 Oct 2021 11:49:42.869 # !!! Warning: short read while loading the AOF file !!!
1:M 22 Oct 2021 11:49:42.869 # !!! Truncating the AOF at offset 28641046 !!!
1:M 22 Oct 2021 11:49:42.870 # AOF loaded anyway because aof-load-truncated is enabled
1:M 22 Oct 2021 11:49:42.870 * DB loaded from append only file: 0.161 seconds
1:M 22 Oct 2021 11:49:42.870 * Ready to accept connections
1:M 22 Oct 2021 13:59:33.814 * Starting automatic rewriting of AOF on 134% growth
1:M 22 Oct 2021 13:59:33.815 * Background append only file rewriting started by pid 38407
1:M 22 Oct 2021 13:59:33.843 * AOF rewrite child asks to stop sending diffs.
38407:C 22 Oct 2021 13:59:33.843 * Parent agreed to stop sending diffs. Finalizing AOF...
38407:C 22 Oct 2021 13:59:33.843 * Concatenating 0.00 MB of AOF diff received from parent.
38407:C 22 Oct 2021 13:59:33.844 * SYNC append only file rewrite performed
38407:C 22 Oct 2021 13:59:33.845 * AOF rewrite: 1 MB of memory used by copy-on-write
1:M 22 Oct 2021 13:59:33.915 * Background AOF rewrite terminated with success
1:M 22 Oct 2021 13:59:33.916 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1:M 22 Oct 2021 13:59:33.917 * Background AOF rewrite finished successfully