请原谅我的问题有些幼稚,但这不是我目前了解的主题。
我的公司目前正在运行 kubernetes 管理的 fluentd 进程来将日志推送到 logstash。这些 fluentd 进程启动后立即失败,然后再次启动,等等。
fluentd 进程在 CoreOS AWS 实例上的 Docker 容器内运行。
当我查看正在运行的 15 个 fluentd 节点的日志时,它们都显示相同的内容。以下是这些日志的精简版本,其中删除了一些重复项和时间戳:
Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-logging", :port=>9200, :scheme=>"http"}
process finished code=9
fluentd main process died unexpectedly. restarting.
starting fluentd-0.12.29
gem 'fluent-mixin-config-placeholders' version '0.4.0'
gem 'fluent-mixin-plaintextformatter' version '0.2.6'
gem 'fluent-plugin-docker_metadata_filter' version '0.1.3'
gem 'fluent-plugin-elasticsearch' version '1.5.0'
gem 'fluent-plugin-kafka' version '0.3.1'
gem 'fluent-plugin-kubernetes_metadata_filter' version '0.24.0'
gem 'fluent-plugin-mongo' version '0.7.15'
gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5'
gem 'fluent-plugin-s3' version '0.7.1'
gem 'fluent-plugin-scribe' version '0.10.14'
gem 'fluent-plugin-td' version '0.10.29'
gem 'fluent-plugin-td-monitoring' version '0.2.2'
gem 'fluent-plugin-webhdfs' version '0.4.2'
gem 'fluentd' version '0.12.29'
adding match pattern="fluent.**" type="null"
adding filter pattern="kubernetes.*" type="parser"
adding filter pattern="kubernetes.*" type="parser"
adding filter pattern="kubernetes.*" type="parser"
adding filter pattern="kubernetes.**" type="kubernetes_metadata"
adding match pattern="**" type="elasticsearch"
adding source type="tail"
adding source type="tail"
adding source type="tail"
...
using configuration file: <ROOT>
<match fluent.**>
type null
</match>
<source>
type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag kubernetes.*
format json
read_from_head true
</source>
<filter kubernetes.*>
@type parser
format json
key_name log
reserve_data true
suppress_parse_error_log true
</filter>
...
...
<match **>
type elasticsearch
log_level info
include_tag_key true
host elasticsearch-logging
port 9200
logstash_format true
buffer_chunk_limit 2M
buffer_queue_limit 32
flush_interval 5s
max_retry_wait 30
disable_retry_limit
num_threads 8
</match>
</ROOT>
following tail of /var/log/containers/node-exporter-rqwwn_prometheus_node-exporter-78027c5c818ab42a143fdd684ce2e71bf15cc22e085cfb4f0155854d2248d572.log
following tail of /var/log/containers/fluentd-elasticsearch-0qc6r_kube-system_fluentd-elasticsearch-fccf8db40a19df4a84575c77ac845921386db098d96ef27d1f565da1d928c336.log
following tail of /var/log/containers/node-exporter-rqwwn_prometheus_POD-65ed0741bb78a32e6e129ebc9a96b56284f32d81aba0d66c129df02c9e05fb5b.log
following tail of /var/log/containers/alertmanager-1407110495-s8j6k_prometheus_POD-1807d1ab9c99ce2c4da81fcd5b589e604f4c0dc85cc85a351706b52dc747d21b.log
...
following tail of /var/log/containers/rail-prod-v071-n0zgz_prod_rail-a301220a36cf2a2a537668db44197e2c029f9cc1c60c345218909cd86a84e717.log
Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-logging", :port=>9200, :scheme=>"http"}
process finished code=9
fluentd main process died unexpectedly. restarting.
starting fluentd-0.12.29
...
我猜想可能是没有配置足够的内存,或者类似的原因导致服务在启动时立即重新启动?消息“进程完成代码=9”是否指向特定问题?
如果有人之前见过类似的事情,请帮我发表评论。谢谢。