我在 HP Proliant 微型服务器上安装了 Ubuntu Server 22.04。此服务器充当文件服务器以及 DNS 和 DHCP 服务器。我面临的问题是文件传输期间的网络性能不佳。重启后,我将获得大约 50-60Mib/s 的传输速度。但这会慢慢下降并稳定到 ~2Mib/s。速度的降低似乎与时间有关,而不是与数据有关。即重启半小时后速度会下降,而不是传输 2GB 后。
所有测试均在有线连接上进行。我不认为网络基础设施是罪魁祸首,因为我已在同一个交换机端口上使用 Windows 10 计算机测试了传输,并且没有发现任何性能下降。
我也不相信这是文件服务器存储阵列。该阵列由 4 个 2TB WD red (WD20EFRX-68E) 磁盘组成,组成 RAID 5 阵列。系统使用 120GB SSD 作为启动驱动器。我在 SSD 上托管了一个 2GB 的文件,网络传输仍然很慢。
我使用 SMB 和 NFS 进行网络共享。但这两种技术的性能同样糟糕。
服务器使用 Intel 82571EB 双千兆以太网 PCI 适配器形式的绑定 NIC 连接到网络。连接的交换机是 TP-link T1600G-18TS V1。绑定链路连接到两个配置为 LACP 的端口。我也不认为此设置是导致故障的原因,因为如果我切换回板载 NIC,我仍然可以获得相同的性能。
服务器网络连接使用 netplan 设置。服务器 IP 固定为 192.168.0.2。请参阅下面的 netplan 配置;
chris@paveycloud:~$ cat /etc/netplan/01-netcfg.yaml
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
version: 2
renderer: networkd
ethernets:
enp2s0f0:
dhcp4: no
dhcp6: no
enp2s0f1:
dhcp4: no
dhcp6: no
bonds:
bond0:
interfaces:
- enp2s0f0
- enp2s0f1
addresses: [192.168.0.2/24]
routes:
- to: default
via: 192.168.0.1
nameservers:
addresses:
- 1.1.1.1
- 1.0.0.1
parameters:
transmit-hash-policy: layer2
mode: 802.3ad
lacp-rate: slow
mii-monitor-interval: 1
我使用 dnsmask 作为我的 DNS 和 DHCP 服务器,请参阅下面的配置;
chris@paveycloud:~$ cat /etc/dnsmasq.conf
# Listen address
listen-address=127.0.0.1,192.168.0.2
# Never forward plain names (without a domain)
domain-needed
# Never forward addresses in the non-routable address space (RFC1918)
bogus-priv
# Add domain to host names
expand-hosts
# Domain to be added if expand-hosts is set
domain=paveycloud.com
# Local domain to be served from /etc/hosts file
local=/paveycloud.com/
# local domain translation
address=/paveycloud.com/192.168.0.2
address=/paveycloud.noip.me/192.168.0.2
# dhcp stuff
dhcp-range=192.168.0.11,192.168.0.254,12h
dhcp-lease-max=100
dhcp-option=option:router,192.168.0.1
dhcp-option=option:dns-server,192.168.0.2
dhcp-option=option:netmask,255.255.255.0
dhcp-boot=lpxelinux.0
dhcp-authoritative
# static addresses 192.168.0.1 > 192.168.0.10
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.1,NETGEAR_VDSL_DM200_GATEWAY
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.3,NETGEAR_EX7000_HOUSE_AP
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.4,NETGEAR_EX7000_GARAGE_AP
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.5,TP-LINK_T1600G-18TS_HOUSE_SW
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.6,TP-LINK_T1600G-18TS_GARAGE_SW
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.7,BEDROOM_NVIDIA_SHIELD_ETH
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.8,LOUNGE_NVIDIA_SHIELD_ETH
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.9,LOUNGE_NVIDIA_SHIELD_WIFI
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.10,VBOX_TV_GATEWAY_ETH
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.70,NANOSTATION_LOCO_M5_AP
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.71,NANOSTATION_LOCO_M5_STATION
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.90,AXIS_VIDEO_SERVER
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.100,ZONEMINDER_SERVER
chris@paveycloud:~$ cat /etc/resolv.conf
nameserver 1.1.1.1
nameserver 1.0.0.1
如果我 cat /prob/net/bonding/bond0 并使用 ethtool 查看 bond0,一切似乎正常;
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.15.0-56-generic
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 1
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Slave Interface: enp2s0f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:26:55:e3:bb:e3
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
Slave Interface: enp2s0f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:26:55:e3:bb:e2
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
chris@paveycloud:~$ sudo ethtool bond0
Settings for bond0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 2000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: yes
任何关于此问题的帮助都将不胜感激。
克里斯
答案1
更新:我找到了解决方案,问题出在 fail2ban 上,更具体地说是我如何配置它。我已将禁止时间设置为 -1,以永久禁止尝试在没有有效 ssh 密钥的情况下登录我的服务器的 IP 地址。它以这种方式配置了大约两年,并禁止了约 60k 个 IP 地址。这显然是导致我看到的问题的原因。我不知道这是否是由于 IP 表过度膨胀或仅仅是网络流量的处理开销。但将禁止时间更改为 48 小时似乎已经解决了这个问题。
非常感谢@Matias N Goldberg 的帮助。
克里斯