GKE 上运行的 ElasticSearch 集群出现问题。具有“数据”角色的节点开始意外崩溃并出现错误:
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
bootstrap checks failed
当然,这个 StatefulSet 控制器中有一个 init 容器,它将 vm.max_map_count 设置为 262144
甚至,这个初始化容器似乎已成功完成:
kubectl descrive pod elastic-data-0
Init Containers:
init-sysctl:
Container ID: docker://23d3b3d11198510aa01aef340b92e1603785804fbf75e963fdbd61acfe458318
Image: busybox:latest
Image ID: docker-pullable://busybox@sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd
Port: <none>
Command:
sysctl
-w
vm.max_map_count=262144
State: Terminated
Reason: Completed
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: elastic-data
labels:
component: elasticsearch
role: data
spec:
serviceName: elasticsearch-data
updateStrategy:
type: RollingUpdate
replicas: 3
template:
metadata:
labels:
component: elasticsearch
role: data
spec:
initContainers:
- name: init-sysctl
image: busybox:latest
imagePullPolicy: Always
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: role
operator: In
values:
- data
topologyKey: kubernetes.io/hostname
containers:
- name: es-data
image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
env:
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: cluster.name
value: "elastic"
- name: node.master
value: "false"
- name: node.data
value: "true"
- name: node.ingest
value: "false"
- name: http.enabled
value: "true"
- name: bootstrap.memory_lock
value: "false"
- name: path.data
value: "/data/data"
- name: path.logs
value: "/data/log"
- name: discovery.zen.ping.unicast.hosts
value: "elasticsearch-discovery"
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: processors
valueFrom:
resourceFieldRef:
resource: limits.cpu
resources:
limits:
cpu: 2
requests:
cpu: 300m
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
livenessProbe:
tcpSocket:
port: transport
initialDelaySeconds: 20
periodSeconds: 10
readinessProbe:
httpGet:
path: /_cluster/health
port: http
initialDelaySeconds: 20
timeoutSeconds: 5
volumeMounts:
- name: storage-volume
mountPath: /data
securityContext:
runAsUser: 1000
fsGroup: 100
volumeClaimTemplates:
- metadata:
name: storage-volume
spec:
storageClassName: manual
accessModes: [ ReadWriteOnce ]
resources:
requests:
storage: 300Gi
日志:
kubectl logs elastic-data-0
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[2018-08-16T15:40:33,998][INFO ][o.e.n.Node ] [elastic-data-0] initializing ...
[2018-08-16T15:40:34,163][INFO ][o.e.e.NodeEnvironment ] [elastic-data-0] using [1] data paths, mounts [[/data (/dev/sdb)]], net usable_space [231.2gb], net total_space [245gb], types [ext4]
[2018-08-16T15:40:34,165][INFO ][o.e.e.NodeEnvironment ] [elastic-data-0] heap size [503.6mb], compressed ordinary object pointers [true]
[2018-08-16T15:40:34,544][INFO ][o.e.n.Node ] [elastic-data-0] node name [elastic-data-0], node ID [C2vCCIpHS3mpiDHduimS0g]
[2018-08-16T15:40:34,545][INFO ][o.e.n.Node ] [elastic-data-0] version[6.3.2], pid[1], build[default/tar/053779d/2018-07-20T05:20:23.451332Z], OS[Linux/4.14.22+/amd64], JVM["Oracle Corporation"/OpenJDK 64-Bit Server VM/10.0.2/10.0.2+13]
[2018-08-16T15:40:34,545][INFO ][o.e.n.Node ] [elastic-data-0] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.zCR3bQNp, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -Des.cgroups.hierarchy.override=/, -Xms512m, -Xmx512m, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
[2018-08-16T15:40:36,612][WARN ][o.e.d.c.s.Settings ] [http.enabled] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version.
[2018-08-16T15:40:38,484][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [aggs-matrix-stats]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [analysis-common]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [ingest-common]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [lang-expression]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [lang-mustache]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [lang-painless]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [mapper-extras]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [parent-join]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [percolator]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [rank-eval]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [reindex]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [repository-url]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [transport-netty4]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [tribe]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-core]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-deprecation]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-graph]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-logstash]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-ml]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-monitoring]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-rollup]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-security]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-sql]
[2018-08-16T15:40:38,488][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-upgrade]
[2018-08-16T15:40:38,488][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded module [x-pack-watcher]
[2018-08-16T15:40:38,488][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded plugin [ingest-geoip]
[2018-08-16T15:40:38,489][INFO ][o.e.p.PluginsService ] [elastic-data-0] loaded plugin [ingest-user-agent]
[2018-08-16T15:40:44,991][INFO ][o.e.x.s.a.s.FileRolesStore] [elastic-data-0] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml]
[2018-08-16T15:40:45,793][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/61] [Main.cc@109] controller (64 bit): Version 6.3.2 (Build 903094f295d249) Copyright (c) 2018 Elasticsearch BV
[2018-08-16T15:40:47,003][INFO ][o.e.d.DiscoveryModule ] [elastic-data-0] using discovery type [zen]
[2018-08-16T15:40:48,139][INFO ][o.e.n.Node ] [elastic-data-0] initialized
[2018-08-16T15:40:48,140][INFO ][o.e.n.Node ] [elastic-data-0] starting ...
[2018-08-16T15:40:48,337][INFO ][o.e.t.TransportService ] [elastic-data-0] publish_address {10.0.1.11:9300}, bound_addresses {[::]:9300}
[2018-08-16T15:40:48,452][INFO ][o.e.b.BootstrapChecks ] [elastic-data-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-08-16T15:40:48,477][INFO ][o.e.n.Node ] [elastic-data-0] stopping ...
[2018-08-16T15:40:48,503][INFO ][o.e.n.Node ] [elastic-data-0] stopped
[2018-08-16T15:40:48,504][INFO ][o.e.n.Node ] [elastic-data-0] closing ...
[2018-08-16T15:40:48,525][INFO ][o.e.n.Node ] [elastic-data-0] closed
[2018-08-16T15:40:48,529][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started
Kubernetes 版本是 1.10.5-gke.4(是的,它在 GKE 上)
任何想法都值得赞赏。
答案1
问题是,仅当 pod 创建到节点时才会执行 init-containers,我认为您在此之后重新启动了 kubernetes 节点,但确实出现了错误:
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
通常要解决这个问题,您应该在重新启动 kubernetes 时对每个节点运行以下命令:
sudo sysctl -w vm.max_map_count=262144
我建议你使用 DeamonSet 来管理你的集群。似乎可以解决问题。使用谷歌的启动脚本容器。你会发现解决方案下面提供。
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
k8s-app: sysctl-conf
name: sysctl-conf
spec:
template:
metadata:
labels:
k8s-app: sysctl-conf
spec:
containers:
- command:
- sh
- -c
- sysctl -w vm.max_map_count=262166 && while true; do sleep 86400; done
image: busybox:1.26.2
name: sysctl-conf
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
securityContext:
privileged: true
terminationGracePeriodSeconds: 1
要验证您的更新,请取一个节点,通过 ssh 连接到该节点并运行命令来列出最大 VM 数量:
sudo sysctl -a | grep vm.max_map_count