我在使用一个没有自己设置或配置的系统时遇到了问题,但系统的某些部分似乎通常位于或多或少默认的位置。我目前的理论是,当服务器上的存储空间完全填满时,(Ubuntu 16.04.3 LTS)系统上的 PostgreSQL 数据库停止工作。
此后存储量有所增加,一些旧文件已被删除,但 PostgreSQL 服务/集群拒绝启动。见下文:
ubuntu@ip-<my-ip-here>:~$ sudo pg_ctlcluster 11 main start
Job for [email protected] failed because a configured resource limit was exceeded. See "systemctl status [email protected]" and "journalctl -xe" for details.
ubuntu@ip-<my-ip-here>:~$ systemctl status [email protected]
● [email protected] - PostgreSQL Cluster 11-main
Loaded: loaded (/lib/systemd/system/[email protected]; disabled; vendor preset: enabled)
Active: failed (Result: resources) since Mon 2023-01-09 16:24:10 AEST; 7s ago
Process: 2281 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect %i start (code=exited, status=1/FAILURE)
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: 2023-01-09 16:24:10.052 AEST [2289] CONTEXT: WAL redo at 7F/200AA78 for XLOG/FPI:
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: 2023-01-09 16:24:10.053 AEST [2288] LOG: startup process (PID 2289) exited with exit code 1
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: 2023-01-09 16:24:10.053 AEST [2288] LOG: aborting startup due to startup process failure
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: 2023-01-09 16:24:10.059 AEST [2288] LOG: database system is shut down
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: pg_ctl: could not start server
Jan 09 16:24:10 ip-172-31-23-196 postgresql@11-main[2281]: Examine the log output.
Jan 09 16:24:10 ip-172-31-23-196 systemd[1]: [email protected]: Can't open PID file /run/postgresql/11-main.pid (yet?) after start: No such file or directory
Jan 09 16:24:10 ip-172-31-23-196 systemd[1]: Failed to start PostgreSQL Cluster 11-main.
Jan 09 16:24:10 ip-172-31-23-196 systemd[1]: [email protected]: Unit entered failed state.
Jan 09 16:24:10 ip-172-31-23-196 systemd[1]: [email protected]: Failed with result 'resources'.
这Can't open PID file /run/postgresql/11-main.pid (yet?) after start: No such file or directory
部分看起来像是一个转移注意力的话题——在实际问题已经发生后才出现的问题。(此外,如果我创建该文件,并赋予用户权利和所有权postgres
,它会按预期被删除)
因此,我认为问题出在CONTEXT: WAL redo at 7F/200AA78 for XLOG/FPI:
后面的LOG: startup process (PID 2289) exited with exit code
and上aborting startup due to startup process failure
——在“WAL redo”期间显然出现了错误。
这句话Examine the log output.
听起来很有道理,但我在任何我期望的地方都找不到日志:
ubuntu@ip-<my-ip-here>:~$ ls /etc/postgresql/11/main
conf.d environment pg_ctl.conf pg_hba.conf pg_ident.conf postgresql.conf start.conf
并且它postgresql.conf
没有提供任何线索告诉我在哪里可以找到它 - 所以我认为它无法写入它,是吗?
欢迎任何有关如何让服务器继续前进的建议。我不会介意丢失它试图从 WAL 重做中整合的任何数据 - 我主要感兴趣的是让系统恢复正常运行。
答案1
正如用户@user1686 正确指出的那样,Ubuntu 上 PostgreSQL 的日志可以在 中找到/var/log/postgresql
,postgresql-11-main.log
因为我对数据库感兴趣main
并且正在运行 PostgreSQL 11。
该日志文件记录了以下内容:
2023-01-10 10:36:39.271 AEST [11564] LOG: database system was interrupted while in recovery at 2023-01-09 16:24:10 AEST
2023-01-10 10:36:39.271 AEST [11564] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
2023-01-10 10:36:39.315 AEST [11564] LOG: could not open directory "pg_tblspc/17293/lost+found": Permission denied
2023-01-10 10:36:39.319 AEST [11564] LOG: database system was not properly shut down; automatic recovery in progress
2023-01-10 10:36:39.323 AEST [11564] LOG: redo starts at 7F/1FA68D0
2023-01-10 10:36:39.335 AEST [11564] FATAL: could not extend file "pg_tblspc/17293/PG_11_201809051/16386/30550227": No space left on device
这解释了为什么重做失败,因为设备上没有空间容纳pg_tblspc/17293
。
执行下列操作后:
sudo ls /var/lib/postgresql/11/main/pg_tblspc -alF
...
lrwxrwxrwx 1 postgres postgres 11 Sep 4 2019 17293 -> /trace_data/
和:
ubuntu@ip-<my-ip-here>:~$ df /trace_data/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvdf 206293688 196609516 0 100% /trace_data
ubuntu@ip-<my-ip-here>:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 100G 0 disk
└─xvda1 202:1 0 100G 0 part /
xvdf 202:80 0 200G 0 disk /trace_data
我意识到我必须增加卷的大小xvdf
,我可以通过增加虚拟机上的卷大小来实现,然后使用文件系统sudo resize2fs /dev/xvdf
(因为它是一个ext4
文件系统)来扩展它。之后,我能够重新启动 PostgreSQL 集群,并解决我的进一步问题。