我遇到了一个与此类似的问题;https://forums.docker.com/t/cant-access-service-in-swarm/63876。尽管我的设置略有不同,但我还没有找到解决问题的方法。
最小且可重复的示例
在至少 3 个 Ubuntu 20.04 docker swarm 管理器之间构建一个 swarm 集群。
部署服务
docker service create --name test_web --replicas 3 --publish published=8080,target=80 nginxdemos/hello
检查容器和服务是否正确创建,并观察连接到该服务是否失败:
demi-ubu01:~/stacks$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d4a12a3c5448 nginxdemos/hello:latest "nginx -g 'daemon of…" About a minute ago Up About a minute 80/tcp test_web.2.yul33wdycarig3qoxnehgrjrz
demi-ubu01:~/stacks$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
0yqd7gvggwuh test_web replicated 3/3 nginxdemos/hello:latest *:8080->80/tcp
# External test:
demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080
curl: (7) Failed to connect to 10.100.4.5 port 8080: Connection refused
# Inside container to published service port:
demi-ubu01:~/stacks$ docker exec -it d4a12a3c5448 wget http://test_web:8080
Connecting to test_web:8080 (10.0.4.2:8080)
wget: can't connect to remote host (10.0.4.2): Host is unreachable
# Inside container to apps exposed port:
demi-ubu01:~/stacks$ docker exec -it d4a12a3c5448 wget http://localhost:80
Connecting to localhost:80 (127.0.0.1:80)
index.html 100% |****************************| 7217 0:00:00 ETA
第一个 curl 命令的预期结果应该是 Status 200 Ok。
详细报告
我的设置总共有 4 个节点。它们是相同的 Ubuntu 20.04 KVM 虚拟机,都位于同一网络上。它们之间没有防火墙。我有 3 个管理器和 1 个工作器(我只是在故障排除期间将其作为一步添加的)。
:~/stacks$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
kcm5v64psntjxngnqkfdj1jzh * demi-ubu01 Ready Active Reachable 20.10.1
uo3rljg6ax5qkjm898pyym9t1 demi-ubu02 Ready Active Leader 20.10.1
pysnl8sohdp4fv67gui156z4k demi-ubu03 Ready Active Reachable 20.10.1
rp2otsqpnxkgbmxbpkv21yjs6 demi-ubu04 Ready Active 20.10.1
我可以正常运行容器并在本地主机上正常访问它。
demi-ubu01:~/stacks$ docker run -p 8080:80 -d nginxdemos/hello
de4d0a937710acb1d6d8ae3b7eb9175860b6614dfd9ce92bc972efe619ae095f
demi-ubu01:~/stacks$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
de4d0a937710 nginxdemos/hello "nginx -g 'daemon of…" 4 seconds ago Up 2 seconds 0.0.0.0:8080->80/tcp pedantic_wiles
demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080
HTTP/1.1 200 OK
Server: nginx/1.13.8
Date: Sat, 19 Dec 2020 17:59:23 GMT
Content-Type: text/html
Connection: keep-alive
Expires: Sat, 19 Dec 2020 17:59:22 GMT
Cache-Control: no-cache
但是,使用以下撰写文件将同一应用程序部署为服务:
demi-ubu01:~/stacks$ cat test.yml
version: "3.6"
services:
web:
image: nginxdemos/hello:latest
deploy:
replicas: 3
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
ports:
- target: 80
published: 8080
protocol: tcp
mode: ingress
networks:
- webnet
networks:
webnet:
driver: overlay
根本无法从任何主机访问它:
demi-ubu01:~/stacks$ docker stack deploy -c test.yml test
Creating network test_webnet
Creating service test_web
demi-ubu01:~/stacks$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
05030ef897a1 nginxdemos/hello:latest "nginx -g 'daemon of…" 10 seconds ago Up 7 seconds 80/tcp test_web.1.kobrpkp68f2qbs4jhd6o8aebg
# Trying on all of the hosts in the cluster. No firewalls here.
demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080
curl: (7) Failed to connect to 10.100.4.5 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.9:8080
curl: (7) Failed to connect to 10.100.4.9 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.10:8080
curl: (7) Failed to connect to 10.100.4.10 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.11:8080
curl: (7) Failed to connect to 10.100.4.11 port 8080: Connection refused
demi-ubu01:~/stacks$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
elvfm7o4v4zo test_web replicated 3/3 nginxdemos/hello:latest *:8080->80/tcp
我也没有看到这些主机上有任何端口绑定,所以看起来好像没有任何端口被发布。
INeed2Poo@demi-ubu01:~/stacks$ docker service inspect test_web
[
## https://pastebin.com/WqqyDnVS ##
]
demi-ubu01:~/stacks$ netstat -na | grep LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:24007 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN
demi-ubu01:~/stacks$ docker network ls
NETWORK ID NAME DRIVER SCOPE
6e5f7e7cebc3 bridge bridge local
7a1155f87a62 docker_gwbridge bridge local
ab32da8ac1ec host host local
46id8wzw4ayf ingress overlay swarm
a24a40ef78f4 none null local
d9l7msysdx8m test_webnet overlay swarm
INeed2Poo@demi-ubu01:~/stacks$ docker network inspect 46id8wzw4ayf
[
https://pastebin.com/JPA0ZBjE
]
当我执行到该服务的容器中时,我也无法访问该服务。执行到容器中时,我能够访问本地应用程序端口,但是我无法通过名称访问该服务。容器可以解析服务名称。
## Testing the app's service from the local container fails:
demi-ubu01:~/stacks$ docker exec -it 05030ef897a1 wget http://test_web:8080
Connecting to test_web:8080 (10.0.4.2:8080)
wget: can't connect to remote host (10.0.4.2): Host is unreachable
## Testing the app's local port from the local container is sucessful:
demi-ubu01:~/stacks$ docker exec -it 05030ef897a1 wget http://localhost:80
Connecting to localhost:80 (127.0.0.1:80)
index.html 100% |****************************| 7217 0:00:00 ETA
demi-ubu01:~/stacks$ docker --version
Docker version 20.10.1, build 831ebea
我还将 Swarm 集群的默认地址池从原来的 10.0.0.0/8 网络更改为:
demi-ubu01:~$ docker info --format '{{json .Swarm.Cluster.DefaultAddrPool}}'
["10.135.0.0/16"]
我已经确保没有使用可能导致此问题的任何重叠网络,甚至完全重新部署了集群。我几乎用尽了所有故障排除方法。有什么想法吗?
编辑:更新:我使用 Ubuntu 18.04 作为我的基础映像进行了重新部署,并且在其上的完全相同的设置(使用 ansible 部署)似乎运行良好......所以这是 Ubuntu 20.04 上当前版本的 Docker 的问题。