我在单个 docker 主机上的 docker swarm 中运行了一些服务。所有服务都在同一个覆盖网络中运行。这些服务都公开了一个不同的端口,Web 服务器在该端口上可用。docker-host 运行 CoreOS(1520.0.0 Alpha 通道)。
有时我会遇到这样的情况:http://docker-host.local:超时。当我登录 docker-host 并向 localhost 发出请求时:它也会超时。但是从不同容器中的 shell 向服务发出的请求确实成功,没有任何问题。
docker service ls
显示正确的端口映射。
无法访问的服务似乎是随机的。有时所有服务都正常运行,有时某个服务无法访问,有时一段时间后问题解决。
我已经检查了 docker 网络,它们与主机网络并不冲突。
我可以通过创建一组 nginx 服务来重现此操作,托管默认网页。文件:docker-compose-test.yml
version: '3.1'
services:
nginx1:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10081:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
nginx2:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10082:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
nginx3:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10083:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
nginx4:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10084:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
nginx5:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10085:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
nginx6:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10086:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
nginx7:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10087:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
nginx8:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10088:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
nginx9:
image: nginx:1.11.8-alpine
networks:
- test
ports:
- "10089:80"
deploy:
replicas: 1
restart_policy:
condition: on-failure
networks:
test:
该脚本将部署堆栈、测试可用性并关闭堆栈,直到达到错误情况。文件:test-docker-swarm.sh
#!/bin/bash
DOCKER_HOST=$1
fail=0
while [[ ${fail} -eq 0 ]] ; do
docker -H ${DOCKER_HOST} stack deploy -c docker-compose-test.yml test
sleep 15
for i in $(seq 1 9) ; do
request="http://${DOCKER_HOST}:1008${i}"
echo "making request: ${request}"
curl -s -o /dev/null --max-time 2 ${request}
if [[ $? -ne 0 ]] ; then
echo request failed: ${request}
fail=1
fi
done
if [[ ${fail} -eq 0 ]] ; then
docker -H ${DOCKER_HOST} stack down test
while [[ $(docker -H ${DOCKER_HOST} network ls --filter 'name=^test_' | wc -l) -ne 1 ]]; do
echo "waiting for stack to go down"
sleep 2
done
fi
done
执行运行:`./test-docker-swarm.sh
我不知道该采取什么步骤来调试并解决这个问题。任何指点都值得赞赏。
docker 版本
Client:
Version: 17.06.1-ce
API version: 1.30
Go version: go1.8.2
Git commit: 874a737
Built: Tue Aug 29 23:50:27 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.1-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.2
Git commit: 874a737
Built: Tue Aug 29 23:50:09 2017
OS/Arch: linux/amd64
Experimental: false
docker 信息
Containers: 9
Running: 9
Paused: 0
Stopped: 0
Images: 1
Server Version: 17.06.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: x06mlhlwqyo3dg4lmigy18z1q
Is Manager: true
ClusterID: qy022nd3bjn1157sxcc6qzr9n
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 10.255.11.40
Manager Addresses:
10.255.11.40:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
seccomp
Profile: default
selinux
Kernel Version: 4.13.0-rc7-coreos
Operating System: Container Linux by CoreOS 1520.0.0 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 5.776GiB
Name: fqfs-development
ID: RCNI:3ZUR:LTDA:ABIB:EYEW:HCIY:H2RC:XDNT:LC77:BMQH:FKXI:T6YZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
答案1
有一个在 github 上打开问题与您看到的症状相符。我建议跟进此事,向开发人员提供您自己的日志,以便他们查看各个报告之间是否有共同之处。