docker swarm 中所有已发布的服务都无法访问,但部署的容器正常运行

docker swarm 中所有已发布的服务都无法访问,但部署的容器正常运行

我遇到了一个与此类似的问题;https://forums.docker.com/t/cant-access-service-in-swarm/63876。尽管我的设置略有不同,但我还没有找到解决问题的方法。

最小且可重复的示例

  1. 在至少 3 个 Ubuntu 20.04 docker swarm 管理器之间构建一个 swarm 集群。

  2. 部署服务docker service create --name test_web --replicas 3 --publish published=8080,target=80 nginxdemos/hello

  3. 检查容器和服务是否正确创建,并观察连接到该服务是否失败:

demi-ubu01:~/stacks$ docker ps

CONTAINER ID   IMAGE                     COMMAND                  CREATED              STATUS              PORTS     NAMES
d4a12a3c5448   nginxdemos/hello:latest   "nginx -g 'daemon of…"   About a minute ago   Up About a minute   80/tcp    test_web.2.yul33wdycarig3qoxnehgrjrz
demi-ubu01:~/stacks$ docker service ls

ID             NAME      MODE         REPLICAS   IMAGE                     PORTS
0yqd7gvggwuh   test_web      replicated   3/3        nginxdemos/hello:latest   *:8080->80/tcp
# External test:
demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080     
curl: (7) Failed to connect to 10.100.4.5 port 8080: Connection refused

# Inside container to published service port:
demi-ubu01:~/stacks$ docker exec -it d4a12a3c5448 wget http://test_web:8080
Connecting to test_web:8080 (10.0.4.2:8080)
wget: can't connect to remote host (10.0.4.2): Host is unreachable

# Inside container to apps exposed port:
demi-ubu01:~/stacks$ docker exec -it d4a12a3c5448 wget http://localhost:80
Connecting to localhost:80 (127.0.0.1:80)
index.html    100% |****************************|  7217   0:00:00 ETA

第一个 curl 命令的预期结果应该是 Status 200 Ok。

详细报告

我的设置总共有 4 个节点。它们是相同的 Ubuntu 20.04 KVM 虚拟机,都位于同一网络上。它们之间没有防火墙。我有 3 个管理器和 1 个工作器(我只是在故障排除期间将其作为一步添加的)。

:~/stacks$ docker node ls 
ID                            HOSTNAME     STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
kcm5v64psntjxngnqkfdj1jzh *   demi-ubu01   Ready     Active         Reachable        20.10.1
uo3rljg6ax5qkjm898pyym9t1     demi-ubu02   Ready     Active         Leader           20.10.1
pysnl8sohdp4fv67gui156z4k     demi-ubu03   Ready     Active         Reachable        20.10.1
rp2otsqpnxkgbmxbpkv21yjs6     demi-ubu04   Ready     Active                          20.10.1

我可以正常运行容器并在本地主机上正常访问它。

demi-ubu01:~/stacks$ docker run -p 8080:80 -d nginxdemos/hello
de4d0a937710acb1d6d8ae3b7eb9175860b6614dfd9ce92bc972efe619ae095f

demi-ubu01:~/stacks$ docker ps
CONTAINER ID   IMAGE              COMMAND                  CREATED         STATUS         PORTS                  NAMES
de4d0a937710   nginxdemos/hello   "nginx -g 'daemon of…"   4 seconds ago   Up 2 seconds   0.0.0.0:8080->80/tcp   pedantic_wiles

demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080
HTTP/1.1 200 OK
Server: nginx/1.13.8
Date: Sat, 19 Dec 2020 17:59:23 GMT
Content-Type: text/html
Connection: keep-alive
Expires: Sat, 19 Dec 2020 17:59:22 GMT
Cache-Control: no-cache

但是,使用以下撰写文件将同一应用程序部署为服务:

demi-ubu01:~/stacks$ cat test.yml 
version: "3.6"

services:
  web:
    image: nginxdemos/hello:latest
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: "0.1"
          memory: 50M
      restart_policy:
        condition: on-failure
    ports:
      - target: 80
        published: 8080
        protocol: tcp
        mode: ingress
    networks:
      - webnet

networks:
  webnet:
    driver: overlay

根本无法从任何主机访问它:

demi-ubu01:~/stacks$ docker stack deploy -c test.yml test
Creating network test_webnet
Creating service test_web

demi-ubu01:~/stacks$ docker ps
CONTAINER ID   IMAGE                     COMMAND                  CREATED          STATUS         PORTS     NAMES
05030ef897a1   nginxdemos/hello:latest   "nginx -g 'daemon of…"   10 seconds ago   Up 7 seconds   80/tcp    test_web.1.kobrpkp68f2qbs4jhd6o8aebg

# Trying on all of the hosts in the cluster. No firewalls here.

demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080
curl: (7) Failed to connect to 10.100.4.5 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.9:8080
curl: (7) Failed to connect to 10.100.4.9 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.10:8080
curl: (7) Failed to connect to 10.100.4.10 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.11:8080
curl: (7) Failed to connect to 10.100.4.11 port 8080: Connection refused

demi-ubu01:~/stacks$ docker service ls
ID             NAME       MODE         REPLICAS   IMAGE                     PORTS
elvfm7o4v4zo   test_web   replicated   3/3        nginxdemos/hello:latest   *:8080->80/tcp

我也没有看到这些主机上有任何端口绑定,所以看起来好像没有任何端口被发布。


INeed2Poo@demi-ubu01:~/stacks$ docker service inspect test_web
[
    ## https://pastebin.com/WqqyDnVS ##
]

demi-ubu01:~/stacks$ netstat -na | grep LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN

demi-ubu01:~/stacks$ docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
6e5f7e7cebc3   bridge            bridge    local
7a1155f87a62   docker_gwbridge   bridge    local
ab32da8ac1ec   host              host      local
46id8wzw4ayf   ingress           overlay   swarm
a24a40ef78f4   none              null      local
d9l7msysdx8m   test_webnet       overlay   swarm
INeed2Poo@demi-ubu01:~/stacks$ docker network inspect 46id8wzw4ayf
[
    https://pastebin.com/JPA0ZBjE
]

当我执行到该服务的容器中时,我也无法访问该服务。执行到容器中时,我能够访问本地应用程序端口,但是我无法通过名称访问该服务。容器可以解析服务名称。

## Testing the app's service from the local container fails:

demi-ubu01:~/stacks$ docker exec -it 05030ef897a1 wget http://test_web:8080
Connecting to test_web:8080 (10.0.4.2:8080)
wget: can't connect to remote host (10.0.4.2): Host is unreachable


## Testing the app's local port from the local container is sucessful:

demi-ubu01:~/stacks$ docker exec -it 05030ef897a1 wget http://localhost:80
Connecting to localhost:80 (127.0.0.1:80)
index.html    100% |****************************|  7217   0:00:00 ETA
demi-ubu01:~/stacks$ docker --version
Docker version 20.10.1, build 831ebea

我还将 Swarm 集群的默认地址池从原来的 10.0.0.0/8 网络更改为:

demi-ubu01:~$ docker info --format '{{json .Swarm.Cluster.DefaultAddrPool}}'
["10.135.0.0/16"]

我已经确保没有使用可能导致此问题的任何重叠网络,甚至完全重新部署了集群。我几乎用尽了所有故障排除方法。有什么想法吗?

编辑:更新:我使用 Ubuntu 18.04 作为我的基础映像进行了重新部署,并且在其上的完全相同的设置(使用 ansible 部署)似乎运行良好......所以这是 Ubuntu 20.04 上当前版本的 Docker 的问题。

相关内容