我在这个问题上花了几个月的时间,现在我已经束手无策了。我有一个家庭媒体服务器,它运行 docker 来运行容器。我有一个 docker-compose 文件,其中定义了所有内容。盒子本身由网络分配了一个静态 IP(在本例中为 eero)。我跑了docker-compose up -d
,把它留给我的东西。
在一周到一天之间(这是不一致的),机器就会失去网络连接。当前的网络设置是调制解调器 --> eero --> 网络交换机 --> 服务器。我重新连接到服务器的唯一方法是重新启动它。只有这样网络才会恢复正常。我最初在 Debian 上遇到了这个问题(在 9 和 10 上都发生过),但我改变了我的操作系统,因为我的一个朋友运行 Ubuntu 没有问题。我切换到 Ubuntu Server (20),但遇到了同样的问题。简而言之,我确实看了https://github.com/moby/moby/issues/36153作为可能的根本原因,但添加建议的文件似乎没有什么作用。
下一个考虑因素是,这可能是硬件问题,因此我从使用板载以太网改为使用 USB-C 以太网适配器。这似乎有效了三天,但后来我遇到了同样的问题。
此时,我不知道如何缩小问题范围。我已经浏览过了syslog
,但似乎没有什么对我来说很突出。我检查了容器日志,但所有容器都很好。在 Debian 上我使用的是Network Manager
,但在 Ubuntu 上我使用的是systemd-networkd
.两人都遇到过这个问题。
我的Ubuntu版本是Ubuntu 20.04 LTS (GNU/Linux 5.4.0-37-generic x86_64)
下面是我的硬件信息,以防有帮助
H/W path Device Class Description
=================================================================
system System Product Name (SKU)
/0 bus PRIME X370-PRO
/0/0 memory 64KiB BIOS
/0/2c memory 16GiB System Memory
/0/2c/0 memory 8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
/0/2c/1 memory [empty]
/0/2c/2 memory 8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
/0/2c/3 memory [empty]
/0/2e memory 576KiB L1 cache
/0/2f memory 3MiB L2 cache
/0/30 memory 16MiB L3 cache
/0/31 processor AMD Ryzen 5 1600 Six-Core Processor
/0/100 bridge Family 17h (Models 00h-0fh) Root Complex
/0/100/0.2 generic Family 17h (Models 00h-0fh) I/O Memory Management Unit
/0/100/1.3 bridge Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/1.3/0 bus X370 Series Chipset USB 3.1 xHCI Controller
/0/100/1.3/0/0 usb1 bus xHCI Host Controller
/0/100/1.3/0/0/7 generic Belkin USB-C LAN
/0/100/1.3/0/1 usb2 bus xHCI Host Controller
/0/100/1.3/0.1 scsi0 storage X370 Series Chipset SATA Controller
/0/100/1.3/0.1/0 /dev/sda disk 120GB SanDisk SDSSDA12
/0/100/1.3/0.1/0/1 /dev/sda1 volume 511MiB Windows FAT volume
/0/100/1.3/0.1/0/2 /dev/sda2 volume 111GiB EXT4 volume
/0/100/1.3/0.1/1 /dev/sdb disk 3TB Hitachi HUS72403
/0/100/1.3/0.1/2 /dev/sdc disk 3TB Hitachi HUS72403
/0/100/1.3/0.1/3 /dev/sdd disk 3TB Hitachi HUS72403
/0/100/1.3/0.1/4 /dev/sde disk 3TB Hitachi HUS72403
/0/100/1.3/0.1/5 /dev/sdf disk 3TB Hitachi HUS72403
/0/100/1.3/0.2 bridge X370 Series Chipset PCIe Upstream Port
/0/100/1.3/0.2/0 bridge 300 Series Chipset PCIe Port
/0/100/1.3/0.2/2 bridge 300 Series Chipset PCIe Port
/0/100/1.3/0.2/3 bridge 300 Series Chipset PCIe Port
/0/100/1.3/0.2/4 bridge 300 Series Chipset PCIe Port
/0/100/1.3/0.2/4/0 bus ASM1142 USB 3.1 Host Controller
/0/100/1.3/0.2/4/0/0 usb3 bus xHCI Host Controller
/0/100/1.3/0.2/4/0/1 usb4 bus xHCI Host Controller
/0/100/1.3/0.2/6 bridge 300 Series Chipset PCIe Port
/0/100/1.3/0.2/6/0 enp7s0 network I211 Gigabit Network Connection
/0/100/1.3/0.2/7 bridge 300 Series Chipset PCIe Port
/0/100/3.2 bridge Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/3.2/0 display GP107 [GeForce GTX 1050]
/0/100/3.2/0.1 multimedia GP107GL High Definition Audio Controller
/0/100/7.1 bridge Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
/0/100/7.1/0 generic Zeppelin/Raven/Raven2 PCIe Dummy Function
/0/100/7.1/0.2 generic Family 17h (Models 00h-0fh) Platform Security Processor
/0/100/7.1/0.3 bus Family 17h (Models 00h-0fh) USB 3.0 Host Controller
/0/100/7.1/0.3/0 usb5 bus xHCI Host Controller
/0/100/7.1/0.3/1 usb6 bus xHCI Host Controller
/0/100/8.1 bridge Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
/0/100/8.1/0 generic Zeppelin/Renoir PCIe Dummy Function
/0/100/8.1/0.2 storage FCH SATA Controller [AHCI mode]
/0/100/8.1/0.3 multimedia Family 17h (Models 00h-0fh) HD Audio Controller
/0/100/14 bus FCH SMBus Controller
/0/100/14.3 bridge FCH LPC Bridge
/0/101 bridge Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/102 bridge Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/103 bridge Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/104 bridge Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/105 bridge Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/106 bridge Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/107 bridge Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
/0/108 bridge Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
/0/109 bridge Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
/0/10a bridge Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
/0/10b bridge Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
/0/10c bridge Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
/0/10d bridge Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
/0/10e bridge Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
/0/1 system PnP device PNP0c01
/0/2 system PnP device PNP0b00
/0/3 system PnP device PNP0c02
/0/4 communication PnP device PNP0501
/0/5 system PnP device PNP0c02
/1 br-10d6cc4b0f64 network Ethernet interface
/2 veth80c7cea network Ethernet interface
/3 enx302303052de3 network Ethernet interface
/4 vethf4fd33e network Ethernet interface
/5 vethab1d028 network Ethernet interface
/6 vethb9ac1e0 network Ethernet interface
/7 veth00d454b network Ethernet interface
/8 docker0 network Ethernet interface
这也是我的 docker-compose 文件。我当前的 docker 版本是Docker version 19.03.11, build dd360c7
,我的 docker-compose 版本是docker-compose version 1.26.0, build d4451659
。
version: "3.7"
services:
plex:
image: plexinc/pms-docker
container_name: plex
volumes:
- /mnt/plex/config:/config
- /mnt/plex/Movies:/data/movies
- /mnt/plex/Shows:/data/tvshows
- /mnt/plex/transcode:/data/transcode
ports:
- 32400:32400/tcp
- 3005:3005/tcp
- 8324:8324/tcp
- 32469:32469/tcp
- 1900:1900/udp
- 32410:32410/udp
- 32412:32412/udp
- 32413:32413/udp
- 32414:32414/udp
restart: unless-stopped
environment:
- PUID=1000
- PGID=1000
- VERSION=latest
- TZ=America/Los_Angeles
homebridge:
image: oznu/homebridge:latest
container_name: homebridge
restart: unless-stopped
network_mode: host
environment:
- TZ=America/Los_Angeles
- PGID=1000
- PUID=1000
- HOMEBRIDGE_CONFIG_UI=1
- HOMEBRIDGE_CONFIG_UI_PORT=8008
volumes:
- /mnt/homebridge:/homebridge
nzbget:
image: linuxserver/nzbget:latest
container_name: nzbget
volumes:
- /mnt/nzbget/config:/config
- /mnt/nzbget/downloads:/downloads
restart: unless-stopped
environment:
- TZ=America/Los_Angeles
- PUID=1000
- PGID=1000
ports:
- 6789:6789
sonarr:
image: linuxserver/sonarr:latest
container_name: sonarr
restart: unless-stopped
depends_on:
- nzbget
volumes:
- /mnt/sonarr/config:/config
- /mnt/nzbget/downloads:/downloads
- /mnt/plex/Shows:/tv
environment:
- TZ=America/Los_Angeles
- PUID=1000
- PGID=1000
ports:
- 8989:8989
radarr:
image: linuxserver/radarr:latest
container_name: radarr
restart: unless-stopped
depends_on:
- nzbget
volumes:
- /mnt/radarr/config:/config
- /mnt/nzbget/downloads:/downloads
- /mnt/plex/Movies:/movies
environment:
- TZ=America/Los_Angeles
- PUID=1000
- PGID=1000
ports:
- 7878:7878
tautulli:
image: linuxserver/tautulli:latest
container_name: tautulli
depends_on:
- plex
restart: unless-stopped
environment:
- TZ=America/Los_Angeles
- PUID=1000
- GUID=1000
volumes:
- /mnt/tautulli/config:/config
- /mnt/tautulli/logs:/logs:ro
ports:
- 8181:8181
如果我错过了任何内容,请告诉我,我很乐意提供更多信息。
编辑:
我昨晚还尝试将 Realtek 驱动程序更新到最新版本,看看这是否是问题的原因,因为我在journalctl
Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: HC died; cleaning up
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx timeout
Jun 14 01:17:25 phoenix kernel: usb 1-7: USB disconnect, device number 2
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Get ether addr fail
Jun 14 01:17:25 phoenix systemd-networkd[933]: enx302303052de3: Link DOWN
我这样做了以下https://www.pcsuggest.com/install-rtl8153-driver-linux/。然而,今天早上事情似乎脱节了,所以我不能确定这是否有帮助。
编辑2:
看来 docker 可能会因快照而失败或重新启动?
Jun 24 05:01:47 phoenix docker.dockerd[998]: failed to start containerd: timeout waiting for containerd to start
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Main process exited, code=exited, status=1/FAILURE
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Failed with result 'exit-code'.
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Scheduled restart job, restart counter is at 1.
Jun 24 05:01:47 phoenix systemd[1]: Stopped Service for snap application docker.dockerd.
Jun 24 05:01:47 phoenix systemd[1]: Started Service for snap application docker.dockerd.
之后我可以清楚地看到 IP 重新分配触发器,然后导致我的盒子离线
编辑3:
这是 iplog 的片段
[2020-07-05T00:24:28.507613] Deleted dev vetha537571 lladdr 02:42:ac:13:00:02 STALE
[2020-07-05T00:24:29.019491] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:24:29.019674] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:24:32.603688] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 STALE
[2020-07-05T00:24:59.227481] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 STALE
[2020-07-05T00:25:01.275258] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 STALE
[2020-07-05T00:25:30.715499] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 PROBE
[2020-07-05T00:25:30.715641] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 REACHABLE
[2020-07-05T00:25:34.299181] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:25:38.139499] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 STALE
[2020-07-05T00:25:38.139586] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 STALE
[2020-07-05T00:25:39.931537] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:25:39.931823] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:25:47.099314] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 PROBE
[2020-07-05T00:25:47.099401] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 PROBE
[2020-07-05T00:25:47.101034] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 REACHABLE
[2020-07-05T00:25:47.102485] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 REACHABLE
[2020-07-05T00:25:57.595220] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 PROBE
[2020-07-05T00:25:57.595308] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 REACHABLE
[2020-07-05T00:25:58.363503] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 PROBE
[2020-07-05T00:25:58.363730] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 REACHABLE
[2020-07-05T00:26:00.667505] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 STALE
[2020-07-05T00:26:12.955465] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:26:19.099249] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:26:19.099393] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:26:29.339502] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 STALE
[2020-07-05T00:26:29.339583] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 STALE
[2020-07-05T00:26:37.531222] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 STALE
[2020-07-05T00:26:37.531304] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 STALE
[2020-07-05T00:26:47.003597] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 PROBE
[2020-07-05T00:26:47.003678] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 PROBE
[2020-07-05T00:26:47.005742] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 REACHABLE
[2020-07-05T00:26:47.007351] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 REACHABLE
[2020-07-05T00:27:00.827525] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 PROBE
[2020-07-05T00:27:00.827816] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 REACHABLE
[2020-07-05T00:27:12.859480] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:27:19.003172] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
答案1
这需要我进一步挖掘才能找到答案。但最终我将计算机插入显示器并运行,下次失去网络连接时我发现 CPU 锁定了。
一些快速搜索似乎表明这可能是 Ryzen CPU 的电源状态问题https://askubuntu.com/a/1259021
根据这个答案,我按照本指南禁用了 C6 电源状态https://forum.manjaro.org/t/fix-ryzen-lockups-lated-to-low-system-usage/39723
我的正常运行时间接近 3 天,没有任何问题。目前使用 wifi,但打算将机器切换回有线。我将在一个月后更新,看看自那时以来的正常运行时间如何。希望这可以帮助下一个遇到类似问题的人。
编辑 2022:原始链接已失效,这里是 archive.org 链接https://web.archive.org/web/20200417190251/https://forum.manjaro.org/t/fix-ryzen-lockups-lated-to-low-system-usage/39723