如何调试 Rsyslog 错误“/path/to/log 不存在状态文件(/path/to/statefile)”?

如何调试 Rsyslog 错误“/path/to/log 不存在状态文件(/path/to/statefile)”?

我的设置简要说明:我在 Docker Compose 单服务器配置中拥有 11 个 Docker 容器。其中一些容器会生成日志,我会将这些日志写入(大部分是单独的)主机卷。这样会生成八个日志文件,这些文件同样通过 Docker 卷与 Rsyslog 容器共享。最后,此容器将日志转发到基于云的日志聚合器应用程序 Papertrail。

这一切都正常:我可以近乎实时地查看 Mongo 日志、Apache 日志和各种应用程序级日志。但是,我意识到当我重新启动记录器容器时,Rsyslog 将丢失它已推送的日志的任何记录,因此将再次推送所有日志。Papertrail 可能会进行自己的重复数据删除,但无论如何,删除这些重复数据还是不错的。

我打算添加另一个主机卷,这样即使记录器容器被销毁并重新创建,状态也是永久的。但是,刚刚检查暂存实例时,状态目录/var/spool/rsyslog似乎为空。因此,在执行这项工作之前,我想确保 Rsyslog 确实在写入状态文件。

我是否正确地认为该目录是由$WorkDirectory指令设置的?

以下是我的配置文件的经过稍微编辑的片段:

# Load Modules
module(load="imfile")

# If we don't use this, we'll get the hostname of the Logger container
$LocalHostName missive-test

# See https://help.papertrailapp.com/kb/configuration/advanced-unix-logging-tips#rsyslog-2
$ActionResumeInterval 10
$ActionQueueSize 100000
$ActionQueueDiscardMark 97500
$ActionQueueHighWaterMark 80000
$ActionQueueType LinkedList
$ActionQueueFileName papertrailqueue
$ActionQueueCheckpointInterval 100
$ActionQueueMaxDiskSpace 2g
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueTimeoutEnqueue 2
$ActionQueueDiscardSeverity 0

# My own addition
$WorkDirectory /var/spool/rsyslog

# See https://help.papertrailapp.com/kb/configuration/encrypting-remote-syslog-with-tls-ssl/#download-root-certificates
$DefaultNetstreamDriverCAFile /etc/papertrail-bundle.pem # trust these CAs
$ActionSendStreamDriver gtls # use gtls netstream driver
$ActionSendStreamDriverMode 1 # require TLS
$ActionSendStreamDriverAuthMode x509/name # authenticate by hostname
$ActionSendStreamDriverPermittedPeer *.papertrailapp.com

# Slightly edited to obfuscate exact URL
*.* @@logs0.papertrailapp.com:00000

# Use http://www.rsyslog.com/rsyslog-configuration-builder/ to generate new
# log file watchers.

input(type="imfile"
     File="/var/log/missive/controller/socket-listener.log"
     Tag="socket"
     Facility="local0")

最后一个input指令是一个例子,一共有八个,都非常相似。

我对状态数据位置的判断可能错误吗?我最近重启了几次,每次 Papertrail 确实都获得了 25K 条新记录。所以,要么是状态写入了我看不到的地方,要么是根本没有写入。我是不是漏掉了什么指令?

我已经在 Rsyslog 8.26.0 和 8.31.0 上尝试过此操作,两个版本的结果相同。

更新 1

查看 Rsyslog 本身的一些日志可能会有所帮助。但是,我已经检查过了/var/log,没有任何问题(除了我推送的日志的主机卷)。这需要明确打开吗?

发布此信息后,我找到了一个验证配置文件的命令,它看起来很好——看起来没有什么可以阻止状态文件工作。

/var # rsyslogd -N1
rsyslogd: version 8.26.0, config validation run (level 1), master config /etc/rsyslog.conf
rsyslogd: End of config validation run. Bye.

更新 2

我现在已经-dn向 Rsyslog 添加了(调试和前台)开关,并且正在生成(非常详细的)日志文件。在 300K 行其他内容中,我发现了以下内容:

2227.959643591:main thread    : action 0 queue: starting queue
2227.959648334:main thread    : action 0 queue: is disk-assisted, disk will be used on demand
2227.959656291:main thread    : action 0 queue: params: type 1, enq-only 0, disk assisted 1, spoolDir '/var/spool/rsyslog', maxFileSz 1048576, maxQSize 100000, lqsize 0, pqsize 0, child 0, full delay 40000, light delay 70000, deq batch si
ze 16, high wtrmrk 80000, low wtrmrk 70000, discardmrk 97500, max wrkr 1, min msgs f. wrkr 100000
2227.959661911:main thread    : action 0 queue:Reg: finalizing construction of worker thread pool (numworkerThreads 1)
2227.959666678:main thread    : action 0 queue:Reg/w0: finalizing construction of worker instance data (for 1 actions)
2227.959671472:main thread    : action 0 queue:DAwpool: finalizing construction of worker thread pool (numworkerThreads 1)
2227.959675830:main thread    : action 0 queue:DAwpool/w0: finalizing construction of worker instance data (for 1 actions)
2227.959682905:main thread    : action 0 queue[DA]: starting queue
2227.959687826:main thread    : action 0 queue[DA]: .qi file name is '/var/spool/rsyslog/papertrailqueue.qi', len 37
2227.959691356:main thread    : action 0 queue[DA]: I am a child
2227.959702371:main thread    : action 0 queue[DA]: clean startup, no .qi file found
2227.959706512:main thread    : action 0 queue[DA]: state -2040 reading .qi file - can not read persisted info (if any)
2227.959711817:main thread    : file stream N/A params: flush interval 0, async write 0
2227.959719817:main thread    : file stream N/A params: flush interval 0, async write 0
2227.959724086:main thread    : file stream N/A params: flush interval 0, async write 0
2227.959734668:main thread    : action 0 queue[DA]: params: type 2, enq-only 0, disk assisted 0, spoolDir '/var/spool/rsyslog', maxFileSz 1048576, maxQSize 0, lqsize 0, pqsize 0, child 1, full delay -1, light delay -1, deq batch size 8, h
igh wtrmrk 0, low wtrmrk 1, discardmrk 0, max wrkr 1, min msgs f. wrkr 0
2227.959739181:main thread    : action 0 queue[DA]:Reg: finalizing construction of worker thread pool (numworkerThreads 1)
2227.959743488:main thread    : action 0 queue[DA]:Reg/w0: finalizing construction of worker instance data (for 1 actions)
2227.959747045:main thread    : action 0 queue[DA]: queue finished initialization
2227.959753324:main thread    : action 0 queue: DA queue initialized, disk queue 0x5630e6079ea0
2227.959756804:main thread    : action 0 queue: queue finished initialization
2227.959762404:main thread    : Action builtin:omfwd[0x5630e605cd60]: queue 0x5630e605d1e0 started
2227.959766212:main thread    : Activating Ruleset Queue[0] for Ruleset RSYSLOG_DefaultRuleset
2227.959769728:main thread    : activateMainQueue: mainq cnf obj ptr is 0
2227.959773846:main thread    : main Q: starting queue
2227.959781585:main thread    : main Q: is NOT disk-assisted
2227.959789199:main thread    : main Q: params: type 0, enq-only 0, disk assisted 0, spoolDir '/var/spool/rsyslog', maxFileSz 1048576, maxQSize 100000, lqsize 0, pqsize 0, child 0, full delay 97000, light delay 70000, deq batch size 256, 
high wtrmrk 80000, low wtrmrk 20000, discardmrk 98000, max wrkr 2, min msgs f. wrkr 40000
2227.959793605:main thread    : main Q:Reg: finalizing construction of worker thread pool (numworkerThreads 2)
2227.959797797:main thread    : main Q:Reg/w0: finalizing construction of worker instance data (for 1 actions)
2227.959807534:main thread    : main Q:Reg/w1: finalizing construction of worker instance data (for 1 actions)
2227.959813976:main thread    : main Q: queue finished initialization
2227.959819101:main thread    : Main processing queue is initialized and running
2227.959823416:main thread    : running module imfile with config 0x5630e605bda0, term mode: cooperative/SIGTTIN
2227.959847971:main thread    : configuration 0x5630e6047ae0 activated
2227.959891721:main thread    : rsyslog/glbl: using '127.0.0.1' as localhost IP
2227.959897527:main thread    : signaling new internal message via SIGTTOU
2227.959907993:main thread    : rsyslogd: writing pidfile '/var/run/rsyslogd.pid.tmp'.
2227.960012480:main thread    : rsyslogd: initialization completed, transitioning to regular run mode

这个问题提到了/var/spool/rsyslog几次,所以看起来配置确实设置正确。但是我仍然没有找到任何可以解释为什么在推送到 Papertrail 后没有留下状态文件的东西。

我将继续查看日志,看看是否还有其他内容。与此同时,欢迎对此发表评论。

更新 3

啊哈,我发现了这些行:

/ # grep -C 1 "NO state file" /var/log/rsyslog.log
2227.960253654:imfile.c       : imfile: trying to open state for '/var/log/missive/controller/socket-listener.log', state file 'imfile-state:-var-log-missive-controller-socket-listener.log'
2227.960264463:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-controller-socket-listener.log) exists for '/var/log/missive/controller/socket-listener.log'
2227.960268423:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/controller/socket-listener.log'
--
2227.971282226:imfile.c       : imfile: trying to open state for '/var/log/missive/transmitter/queue.log', state file 'imfile-state:-var-log-missive-transmitter-queue.log'
2227.971298328:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-transmitter-queue.log) exists for '/var/log/missive/transmitter/queue.log'
2227.971302589:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/transmitter/queue.log'
--
2228.510460963:imfile.c       : imfile: trying to open state for '/var/log/missive/transmitter/worker-manager.log', state file 'imfile-state:-var-log-missive-transmitter-worker-manager.log'
2228.510480060:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-transmitter-worker-manager.log) exists for '/var/log/missive/transmitter/worker-manager.log'
2228.510484619:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/transmitter/worker-manager.log'
--
2228.774990028:imfile.c       : imfile: trying to open state for '/var/log/missive/storage/storage-server.log', state file 'imfile-state:-var-log-missive-storage-storage-server.log'
2228.775007074:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-storage-storage-server.log) exists for '/var/log/missive/storage/storage-server.log'
2228.775011149:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/storage/storage-server.log'
--
2228.797432870:imfile.c       : imfile: trying to open state for '/var/log/missive/outtray/outtray-server.log', state file 'imfile-state:-var-log-missive-outtray-outtray-server.log'
2228.797447810:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-outtray-outtray-server.log) exists for '/var/log/missive/outtray/outtray-server.log'
2228.797452333:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/outtray/outtray-server.log'
--
2228.798755146:imfile.c       : imfile: trying to open state for '/var/log/missive/interface/access.log', state file 'imfile-state:-var-log-missive-interface-access.log'
2228.798767248:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-interface-access.log) exists for '/var/log/missive/interface/access.log'
2228.798771170:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/interface/access.log'
--
2228.802581105:imfile.c       : imfile: trying to open state for '/var/log/missive/interface/error.log', state file 'imfile-state:-var-log-missive-interface-error.log'
2228.802592195:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-interface-error.log) exists for '/var/log/missive/interface/error.log'
2228.802595916:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/interface/error.log'
--
2228.818228008:imfile.c       : imfile: trying to open state for '/var/log/missive/mongo/mongodb.log', state file 'imfile-state:-var-log-missive-mongo-mongodb.log'
2228.818241043:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-mongo-mongodb.log) exists for '/var/log/missive/mongo/mongodb.log'
2228.818244939:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/mongo/mongodb.log'
--
2228.826921335:imfile.c       : imfile: trying to open state for '/var/log/missive/traffic/traefik.log', state file 'imfile-state:-var-log-missive-traffic-traefik.log'
2228.826933834:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-traffic-traefik.log) exists for '/var/log/missive/traffic/traefik.log'
2228.826937674:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/traffic/traefik.log'

首次运行时会出现这些情况,但当然,状态文件应该被保留,这样日志就不会再次推送。我会搜索这些错误字符串,以防万一,我会跟踪日志,看看最后是否有任何内容阻止记录最终状态。

更新 4

我尝试在我的input()定义后添加这个,但无济于事:

$InputFilePersistStateInterval 100
$InputRunFileMonitor

这是因为手册上说,第二个参数:

这将激活当前监视器。它没有参数。如果您忘记了此指令,则不会进行文件监视。

不过,我认为这是指通过$命令声明的传统样式日志,而不是新样式input()语法。

更新 5

我突然想到上面的配置覆写默认的 Rsyslog 配置可能会删除一些关键内容。因此,我修改了相关COPY命令,以便rsyslogd.conf保留默认配置,而我的配置则成为默认配置的附加配置。

新的指令(未被注释掉的)如下:

# rsyslog v5: load input modules
# If you do not load inputs, nothing happens!
# You may need to set the module load path if modules are not found.

$ModLoad immark.so # provides --MARK-- message capability
$ModLoad imuxsock.so # provides support for local system logging (e.g. via logger command)
$ModLoad imklog.so # kernel logging (formerly provided by rklogd)

# default permissions for all log files.
$FileOwner root
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022

# Include configuration files from directory
$IncludeConfig /etc/rsyslog.d/*

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                -/var/log/messages

# The authpriv file has restricted access.
authpriv.*                                              /var/log/secure

# Log all the mail messages in one place.
mail.*                                                  -/var/log/maillog

# Log cron stuff
cron.*                                                  -/var/log/cron

# Everybody gets emergency messages
*.emerg                                                 :omusrmsg:*

# Save news errors of level crit and higher in a special file.
uucp,news.crit                                          -/var/log/spooler

# Save boot messages also to boot.log
local7.*                                                /var/log/boot.log

这仍然会将我的日志推送到 Papertrail,并且仍然没有创建假脱机文件。

我一度认为这可能是权限问题,但 Rsyslog 的运行方式与rootAlpine 相同,而且我相信队列在主进程中以线程的形式实现。因此,它们也将以线程的形式运行root,写入应该没有问题/var/spool/rsyslog/*

更新 6

我曾询问过 Rsyslog 错误跟踪器的读者这是否可能是一个错误,可能是因为我正在使用 Docker。

答案1

一些非常乐于助人的工程师巡查 Rsyslog 问题列表,并提出了一个足够好的修复方案。似乎默认情况下,状态文件仅在 Rsyslog 退出时按日志写入,这就是为什么它们不会立即出现的原因。

此外,有人认为 Docker 停止/终止过程在 Rsyslog 有机会写入状态文件之前发生。一位贡献者说 Rsyslog 可能需要长达 90 秒的时间才能写入状态文件SIGTERM,而我的理解是 Docker 通常不会等待那么长时间(默认情况下,它会等待 10 秒后再发送SIGKILL)。

奇怪的是,在我的例子中,Rsyslog 只等待了约 2 秒(远低于 10 秒的默认限制),这表明它干净地退出了,尽管没有写入任何状态文件。我目前不知道为什么会发生这种情况。

修复方法如下。在每个日志声明中,PersistStateInterval可以添加指令来指定状态文件的写入频率(以处理的日志行为单位)。默认为0,这提供了“退出时写入”的行为。

例如:

input(type="imfile"
     File="/var/log/missive/controller/socket-listener.log"
     Tag="socket"
     PersistStateInterval="10"
     Facility="local0")

我将采用以下策略:对慢速日志文件使用较低的值(例如 20),对快速移动的文件使用较高的值(例如 100)。我认为我更喜欢以这种方式进行相当频繁的写入,因为它可以防止电源故障或容器崩溃。

我认为这个修复并不理想,因为容器关闭仍然无法更新确切的日志发送位置。但是,这样做的结果是 Rsyslog 将发送一些重复的日志,这些日志应该由聚合器目标进行去重(间隔 X 将导致最多 X-1 行日志重复,这并不可怕)。

相关内容