Ubuntu/Debian 服务器在连线并运行 Docker 时偶尔会丢失网络

Ubuntu/Debian 服务器在连线并运行 Docker 时偶尔会丢失网络

我在这个问题上花了几个月的时间,现在我已经束手无策了。我有一个家庭媒体服务器,它运行 docker 来运行容器。我有一个 docker-compose 文件,其中定义了所有内容。盒子本身由网络分配了一个静态 IP(在本例中为 eero)。我跑了docker-compose up -d,把它留给我的东西。

在一周到一天之间(这是不一致的),机器就会失去网络连接。当前的网络设置是调制解调器 --> eero --> 网络交换机 --> 服务器。我重新连接到服务器的唯一方法是重新启动它。只有这样网络才会恢复正常。我最初在 Debian 上遇到了这个问题(在 9 和 10 上都发生过),但我改变了我的操作系统,因为我的一个朋友运行 Ubuntu 没有问题。我切换到 Ubuntu Server (20),但遇到了同样的问题。简而言之,我确实看了https://github.com/moby/moby/issues/36153作为可能的根本原因,但添加建议的文件似乎没有什么作用。

下一个考虑因素是,这可能是硬件问题,因此我从使用板载以太网改为使用 USB-C 以太网适配器。这似乎有效了三天,但后来我遇到了同样的问题。

此时,我不知道如何缩小问题范围。我已经浏览过了syslog,但似乎没有什么对我来说很突出。我检查了容器日志,但所有容器都很好。在 Debian 上我使用的是Network Manager,但在 Ubuntu 上我使用的是systemd-networkd.两人都遇到过这个问题。

我的Ubuntu版本是Ubuntu 20.04 LTS (GNU/Linux 5.4.0-37-generic x86_64)

下面是我的硬件信息,以防有帮助

H/W path              Device           Class          Description
=================================================================
                                       system         System Product Name (SKU)
/0                                     bus            PRIME X370-PRO
/0/0                                   memory         64KiB BIOS
/0/2c                                  memory         16GiB System Memory
/0/2c/0                                memory         8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
/0/2c/1                                memory         [empty]
/0/2c/2                                memory         8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
/0/2c/3                                memory         [empty]
/0/2e                                  memory         576KiB L1 cache
/0/2f                                  memory         3MiB L2 cache
/0/30                                  memory         16MiB L3 cache
/0/31                                  processor      AMD Ryzen 5 1600 Six-Core Processor
/0/100                                 bridge         Family 17h (Models 00h-0fh) Root Complex
/0/100/0.2                             generic        Family 17h (Models 00h-0fh) I/O Memory Management Unit
/0/100/1.3                             bridge         Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/1.3/0                           bus            X370 Series Chipset USB 3.1 xHCI Controller
/0/100/1.3/0/0        usb1             bus            xHCI Host Controller
/0/100/1.3/0/0/7                       generic        Belkin USB-C LAN
/0/100/1.3/0/1        usb2             bus            xHCI Host Controller
/0/100/1.3/0.1        scsi0            storage        X370 Series Chipset SATA Controller
/0/100/1.3/0.1/0      /dev/sda         disk           120GB SanDisk SDSSDA12
/0/100/1.3/0.1/0/1    /dev/sda1        volume         511MiB Windows FAT volume
/0/100/1.3/0.1/0/2    /dev/sda2        volume         111GiB EXT4 volume
/0/100/1.3/0.1/1      /dev/sdb         disk           3TB Hitachi HUS72403
/0/100/1.3/0.1/2      /dev/sdc         disk           3TB Hitachi HUS72403
/0/100/1.3/0.1/3      /dev/sdd         disk           3TB Hitachi HUS72403
/0/100/1.3/0.1/4      /dev/sde         disk           3TB Hitachi HUS72403
/0/100/1.3/0.1/5      /dev/sdf         disk           3TB Hitachi HUS72403
/0/100/1.3/0.2                         bridge         X370 Series Chipset PCIe Upstream Port
/0/100/1.3/0.2/0                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/2                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/3                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/4                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/4/0                     bus            ASM1142 USB 3.1 Host Controller
/0/100/1.3/0.2/4/0/0  usb3             bus            xHCI Host Controller
/0/100/1.3/0.2/4/0/1  usb4             bus            xHCI Host Controller
/0/100/1.3/0.2/6                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/6/0    enp7s0           network        I211 Gigabit Network Connection
/0/100/1.3/0.2/7                       bridge         300 Series Chipset PCIe Port
/0/100/3.2                             bridge         Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/3.2/0                           display        GP107 [GeForce GTX 1050]
/0/100/3.2/0.1                         multimedia     GP107GL High Definition Audio Controller
/0/100/7.1                             bridge         Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
/0/100/7.1/0                           generic        Zeppelin/Raven/Raven2 PCIe Dummy Function
/0/100/7.1/0.2                         generic        Family 17h (Models 00h-0fh) Platform Security Processor
/0/100/7.1/0.3                         bus            Family 17h (Models 00h-0fh) USB 3.0 Host Controller
/0/100/7.1/0.3/0      usb5             bus            xHCI Host Controller
/0/100/7.1/0.3/1      usb6             bus            xHCI Host Controller
/0/100/8.1                             bridge         Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
/0/100/8.1/0                           generic        Zeppelin/Renoir PCIe Dummy Function
/0/100/8.1/0.2                         storage        FCH SATA Controller [AHCI mode]
/0/100/8.1/0.3                         multimedia     Family 17h (Models 00h-0fh) HD Audio Controller
/0/100/14                              bus            FCH SMBus Controller
/0/100/14.3                            bridge         FCH LPC Bridge
/0/101                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/102                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/103                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/104                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/105                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/106                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/107                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
/0/108                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
/0/109                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
/0/10a                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
/0/10b                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
/0/10c                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
/0/10d                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
/0/10e                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
/0/1                                   system         PnP device PNP0c01
/0/2                                   system         PnP device PNP0b00
/0/3                                   system         PnP device PNP0c02
/0/4                                   communication  PnP device PNP0501
/0/5                                   system         PnP device PNP0c02
/1                    br-10d6cc4b0f64  network        Ethernet interface
/2                    veth80c7cea      network        Ethernet interface
/3                    enx302303052de3  network        Ethernet interface
/4                    vethf4fd33e      network        Ethernet interface
/5                    vethab1d028      network        Ethernet interface
/6                    vethb9ac1e0      network        Ethernet interface
/7                    veth00d454b      network        Ethernet interface
/8                    docker0          network        Ethernet interface

这也是我的 docker-compose 文件。我当前的 docker 版本是Docker version 19.03.11, build dd360c7,我的 docker-compose 版本是docker-compose version 1.26.0, build d4451659

version: "3.7"

services:
  plex:
    image: plexinc/pms-docker
    container_name: plex
    volumes:
      - /mnt/plex/config:/config
      - /mnt/plex/Movies:/data/movies
      - /mnt/plex/Shows:/data/tvshows
      - /mnt/plex/transcode:/data/transcode
    ports:
      - 32400:32400/tcp
      - 3005:3005/tcp
      - 8324:8324/tcp
      - 32469:32469/tcp
      - 1900:1900/udp
      - 32410:32410/udp
      - 32412:32412/udp
      - 32413:32413/udp
      - 32414:32414/udp
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - VERSION=latest
      - TZ=America/Los_Angeles
  homebridge:
    image: oznu/homebridge:latest
    container_name: homebridge
    restart: unless-stopped
    network_mode: host
    environment:
      - TZ=America/Los_Angeles
      - PGID=1000
      - PUID=1000
      - HOMEBRIDGE_CONFIG_UI=1
      - HOMEBRIDGE_CONFIG_UI_PORT=8008
    volumes:
      - /mnt/homebridge:/homebridge
  nzbget:
    image: linuxserver/nzbget:latest
    container_name: nzbget
    volumes:
      - /mnt/nzbget/config:/config
      - /mnt/nzbget/downloads:/downloads
    restart: unless-stopped
    environment:
      - TZ=America/Los_Angeles
      - PUID=1000
      - PGID=1000
    ports:
      - 6789:6789
  sonarr:
    image: linuxserver/sonarr:latest
    container_name: sonarr
    restart: unless-stopped
    depends_on:
      - nzbget
    volumes:
      - /mnt/sonarr/config:/config
      - /mnt/nzbget/downloads:/downloads
      - /mnt/plex/Shows:/tv
    environment:
      - TZ=America/Los_Angeles
      - PUID=1000
      - PGID=1000
    ports:
      - 8989:8989
  radarr:
    image: linuxserver/radarr:latest
    container_name: radarr
    restart: unless-stopped
    depends_on:
      - nzbget
    volumes:
      - /mnt/radarr/config:/config
      - /mnt/nzbget/downloads:/downloads
      - /mnt/plex/Movies:/movies
    environment:
      - TZ=America/Los_Angeles
      - PUID=1000
      - PGID=1000
    ports:
      - 7878:7878
  tautulli:
    image: linuxserver/tautulli:latest
    container_name: tautulli
    depends_on:
      - plex
    restart: unless-stopped
    environment:
      - TZ=America/Los_Angeles
      - PUID=1000
      - GUID=1000
    volumes:
      - /mnt/tautulli/config:/config
      - /mnt/tautulli/logs:/logs:ro
    ports:
      - 8181:8181

如果我错过了任何内容,请告诉我,我很乐意提供更多信息。

编辑:

我昨晚还尝试将 Realtek 驱动程序更新到最新版本,看看这是否是问题的原因,因为我在journalctl

Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: HC died; cleaning up
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx timeout
Jun 14 01:17:25 phoenix kernel: usb 1-7: USB disconnect, device number 2
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Get ether addr fail
Jun 14 01:17:25 phoenix systemd-networkd[933]: enx302303052de3: Link DOWN

我这样做了以下https://www.pcsuggest.com/install-rtl8153-driver-linux/。然而,今天早上事情似乎脱节了,所以我不能确定这是否有帮助。

编辑2:

看来 docker 可能会因快照而失败或重新启动?

Jun 24 05:01:47 phoenix docker.dockerd[998]: failed to start containerd: timeout waiting for containerd to start
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Main process exited, code=exited, status=1/FAILURE
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Failed with result 'exit-code'.
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Scheduled restart job, restart counter is at 1.
Jun 24 05:01:47 phoenix systemd[1]: Stopped Service for snap application docker.dockerd.
Jun 24 05:01:47 phoenix systemd[1]: Started Service for snap application docker.dockerd.

之后我可以清楚地看到 IP 重新分配触发器,然后导致我的盒子离线

编辑3:

这是 iplog 的片段

[2020-07-05T00:24:28.507613] Deleted dev vetha537571 lladdr 02:42:ac:13:00:02 STALE
[2020-07-05T00:24:29.019491] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:24:29.019674] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:24:32.603688] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 STALE
[2020-07-05T00:24:59.227481] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 STALE
[2020-07-05T00:25:01.275258] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 STALE
[2020-07-05T00:25:30.715499] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 PROBE
[2020-07-05T00:25:30.715641] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 REACHABLE
[2020-07-05T00:25:34.299181] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:25:38.139499] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 STALE
[2020-07-05T00:25:38.139586] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 STALE
[2020-07-05T00:25:39.931537] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:25:39.931823] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:25:47.099314] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 PROBE
[2020-07-05T00:25:47.099401] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 PROBE
[2020-07-05T00:25:47.101034] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 REACHABLE
[2020-07-05T00:25:47.102485] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 REACHABLE
[2020-07-05T00:25:57.595220] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 PROBE
[2020-07-05T00:25:57.595308] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 REACHABLE
[2020-07-05T00:25:58.363503] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 PROBE
[2020-07-05T00:25:58.363730] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 REACHABLE
[2020-07-05T00:26:00.667505] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 STALE
[2020-07-05T00:26:12.955465] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:26:19.099249] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:26:19.099393] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:26:29.339502] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 STALE
[2020-07-05T00:26:29.339583] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 STALE
[2020-07-05T00:26:37.531222] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 STALE
[2020-07-05T00:26:37.531304] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 STALE
[2020-07-05T00:26:47.003597] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 PROBE
[2020-07-05T00:26:47.003678] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 PROBE
[2020-07-05T00:26:47.005742] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 REACHABLE
[2020-07-05T00:26:47.007351] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 REACHABLE
[2020-07-05T00:27:00.827525] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 PROBE
[2020-07-05T00:27:00.827816] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 REACHABLE
[2020-07-05T00:27:12.859480] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:27:19.003172] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE

答案1

这需要我进一步挖掘才能找到答案。但最终我将计算机插入显示器并运行,下次失去网络连接时我发现 CPU 锁定了。

一些快速搜索似乎表明这可能是 Ryzen CPU 的电源状态问题https://askubuntu.com/a/1259021

根据这个答案,我按照本指南禁用了 C6 电源状态https://forum.manjaro.org/t/fix-ryzen-lockups-lated-to-low-system-usage/39723

我的正常运行时间接近 3 天,没有任何问题。目前使用 wifi,但打算将机器切换回有线。我将在一个月后更新,看看自那时以来的正常运行时间如何。希望这可以帮助下一个遇到类似问题的人。

编辑 2022:原始链接已失效,这里是 archive.org 链接https://web.archive.org/web/20200417190251/https://forum.manjaro.org/t/fix-ryzen-lockups-lated-to-low-system-usage/39723

相关内容