与容器的通信并不总是可行的

与容器的通信并不总是可行的

我在单个 docker 主机上的 docker swarm 中运行了一些服务。所有服务都在同一个覆盖网络中运行。这些服务都公开了一个不同的端口,Web 服务器在该端口上可用。docker-host 运行 CoreOS(1520.0.0 Alpha 通道)。

有时我会遇到这样的情况:http://docker-host.local:超时。当我登录 docker-host 并向 localhost 发出请求时:它也会超时。但是从不同容器中的 shell 向服务发出的请求确实成功,没有任何问题。

docker service ls显示正确的端口映射。

无法访问的服务似乎是随机的。有时所有服务都正常运行,有时某个服务无法访问,有时一段时间后问题解决。

我已经检查了 docker 网络,它们与主机网络并不冲突。

我可以通过创建一组 nginx 服务来重现此操作,托管默认网页。文件:docker-compose-test.yml

version: '3.1'
services:
  nginx1:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10081:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure

  nginx2:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10082:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure

  nginx3:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10083:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure

  nginx4:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10084:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure

  nginx5:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10085:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure

  nginx6:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10086:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure

  nginx7:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10087:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure

  nginx8:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10088:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure

  nginx9:
    image: nginx:1.11.8-alpine
    networks:
      - test
    ports:
      - "10089:80"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
networks:
  test:

该脚本将部署堆栈、测试可用性并关闭堆栈,直到达到错误情况。文件:test-docker-swarm.sh

#!/bin/bash

DOCKER_HOST=$1
fail=0

while [[ ${fail} -eq 0 ]] ; do
  docker -H ${DOCKER_HOST} stack deploy -c docker-compose-test.yml test
  sleep 15

  for i in $(seq 1 9) ; do
    request="http://${DOCKER_HOST}:1008${i}"
    echo "making request: ${request}"
    curl -s -o /dev/null --max-time 2 ${request}
    if [[ $? -ne 0 ]] ; then
        echo request failed: ${request}
        fail=1
    fi
  done

  if [[ ${fail} -eq 0 ]] ; then
      docker -H ${DOCKER_HOST} stack down test

    while [[ $(docker -H ${DOCKER_HOST} network ls --filter 'name=^test_' | wc -l) -ne 1 ]]; do
      echo "waiting for stack to go down"
      sleep 2
    done
  fi
done

执行运行:`./test-docker-swarm.sh

我不知道该采取什么步骤来调试并解决这个问题。任何指点都值得赞赏。

docker 版本

Client:
 Version:      17.06.1-ce
 API version:  1.30
 Go version:   go1.8.2
 Git commit:   874a737
 Built:        Tue Aug 29 23:50:27 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.1-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.2
 Git commit:   874a737
 Built:        Tue Aug 29 23:50:09 2017
 OS/Arch:      linux/amd64
 Experimental: false

docker 信息

Containers: 9
 Running: 9
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 17.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: x06mlhlwqyo3dg4lmigy18z1q
 Is Manager: true
 ClusterID: qy022nd3bjn1157sxcc6qzr9n
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 10.255.11.40
 Manager Addresses:
  10.255.11.40:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.13.0-rc7-coreos
Operating System: Container Linux by CoreOS 1520.0.0 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 5.776GiB
Name: fqfs-development
ID: RCNI:3ZUR:LTDA:ABIB:EYEW:HCIY:H2RC:XDNT:LC77:BMQH:FKXI:T6YZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

答案1

有一个在 github 上打开问题与您看到的症状相符。我建议跟进此事,向开发人员提供您自己的日志,以便他们查看各个报告之间是否有共同之处。

相关内容