我们在 Ubuntu 22.04 上运行了一个服务器应用程序。应用程序日志非常垃圾。
journalctl -S -1hour | wc --lines
121349
journalctl -S -1hour | wc --bytes
32382836
有时,服务器会在几分钟内完全无响应。CPU 将达到 100%,我们的应用程序响应如此迟钝,以至于其他代理停止报告任何指标。
当事件结束时,我们注意到我们的应用程序没有崩溃,但记录了一堆超时错误,因为它在最后几分钟内无法执行任何操作。然而,它在那之后确实继续工作。
我发现这个/var/log/syslog
Feb 28 17:44:29 ip-10-11-0-205 kernel: [81118.252411] systemd[1]: Started ntp-systemd-netif.service.
Feb 28 17:44:53 ip-10-11-0-205 kernel: [81142.367869] systemd[1]: ntp-systemd-netif.service: Deactivated successfully.
Feb 28 17:45:13 ip-10-11-0-205 kernel: [81162.387816] systemd[1]: systemd-journald.service: State 'stop-watchdog' timed out. Killing.
Feb 28 17:45:14 ip-10-11-0-205 kernel: [81162.731840] systemd[1]: systemd-journald.service: Killing process 117 (systemd-journal) with signal SIGKILL.
Feb 28 17:45:17 ip-10-11-0-205 kernel: [81165.657264] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=9/KILL
Feb 28 17:45:17 ip-10-11-0-205 kernel: [81165.696972] systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
Feb 28 17:45:18 ip-10-11-0-205 kernel: [81166.651435] systemd[1]: systemd-journald.service: Consumed 2min 5.531s CPU time.
Feb 28 17:45:25 ip-10-11-0-205 kernel: [81174.079307] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
Feb 28 17:45:25 ip-10-11-0-205 kernel: [81174.085466] systemd[1]: Stopped Journal Service.
Feb 28 17:45:25 ip-10-11-0-205 kernel: [81174.200482] systemd[1]: systemd-journald.service: Consumed 2min 5.531s CPU time.
Feb 28 17:45:29 ip-10-11-0-205 kernel: [81177.822409] systemd[1]: Starting Journal Service...
Feb 28 17:45:35 ip-10-11-0-205 kernel: [81183.821550] systemd-journald[240259]: File /var/log/journal/ec212477ed3f3049adade2e820950984/system.journal corrupted or uncleanly shut down, renaming and replacing.
Feb 28 17:45:38 ip-10-11-0-205 kernel: [81187.226010] systemd[1]: Started Journal Service.
因此听起来这像是systemd-journal
导致问题的原因。
问题:
- 这是否只是花了很长时间来处理我的日志?
- 为什么它需要独占整个宿主?
- 如果是,我可以更改一些设置以防出现问题吗?(例如更频繁地修剪日志等)