IPv4 在无头远程服务器上每小时离线数次,IPv6 不受影响

IPv4 在无头远程服务器上每小时离线数次,IPv6 不受影响

在我的 Ubuntu 14.04.2 服务器上,IPv4 每小时会离线几次(我见过一到四次,但每小时没有特定的分钟左右)。

我的托管商坚持认为问题出在服务器端,而且基于 Debian 的救援系统没有表现出相同的症状,这一事实让我认为他们是对的。但是,救援系统不会像已安装的 Ubuntu 系统那样在任何接口上配置全局 IPv6 地址。

通常每小时(基于 IPv4)SSH 连接会由于太多超时数据包而断开一到四次。

当从另一个远程服务器监控服务器时,ICMPv4 ping 要么超时,要么路由器将响应目标主机不可用(我经常看到这两种情况!)。同时 ICMPv6 ping 完全不受影响。

另外,当我使用 IPv6 通过 SSH 从其他远程主机进行连接时,该连接不会停止,系统也不会出现冻结等情况(正如我最初所怀疑的那样)。

系统和内核日志也表明没有问题,无论我禁用所有防火墙规则还是打开防火墙都没有区别。我还让它运行并启用了所有丢弃的数据包的日志记录,以查看是否可以将其中的某些内容关联起来。

在这些离线时间没有cron作业正在运行,并且它也不会在同一分钟左右发生,这表明有一些常规cron作业。

我还缩小了另一个方面的范围。当我 ping (ICMPv4) 时显示症状的主机,环回不受影响eth0。对我来说,这表明它与一般的 IPv4 无关,而是特定于与系统中的一个网卡相对应的接口。

我如何从这里继续进行故障排除?鉴于我到目前为止所做的事情,下一步是什么?是否有一个已知的错误与我看到的症状相对应?

注意:我已经花了一个多月的时间来诊断这个问题。所以在这里问对我来说是最后的手段。请根据需要索取更多详细信息,我将添加它们。


到目前为止我所做的:

  • pingping6
  • mtr从服务器到服务器,我的托管服务商并不认为丢失的几个数据包有任何异常
  • 分别通过 IPv4 和 IPv6 进行 SSH 连接
  • tail-ed /var/log/kern.log/var/log/syslog看看/var/log/auth.log离线期间是否会出现任何内容
  • 分别刷新了 IPv4 和 IPv6 的所有防火墙规则
    • 还简单地启用了丢弃数据包的日志记录
  • 删除了几个我怀疑是潜在罪魁祸首的软件包

以下是手动安装的软件包列表:

# echo $(apt-mark showmanual)
acl adduser aggregate apparmor apparmor-profiles apparmor-utils apt apt-cacher-ng apt-file apt-rdepends apt-utils base-files base-passwd bash bash-completion bash-static bridge-utils bsdutils btrfs-tools busybox-initramfs busybox-static bzip2 bzr ca-certificates cgmanager cgroup-bin cifs-utils colordiff coreutils cpio crda cron cron-apt cryptmount cryptsetup dash debconf debianutils debootstrap debsums dh-python dialog diffutils dnsutils dpkg dpkg-dev duplicity e2fslibs e2fsprogs ed etckeeper fakechroot fakeroot file findutils gcc-4.8-base gcc-4.9-base gdisk-noicu git git-svn gnupg gnutls-bin gpgv grep gzip haveged heirloom-mailx hostname htop ifupdown init-system-helpers initramfs-tools initramfs-tools-bin initscripts insserv iproute2 ipset iptables iputils-ping klibc-utils kmod kpartx less libacl1 libapt-inst1.5 libapt-pkg4.12 libattr1 libaudit-common libaudit1 libblkid1 libbz2-1.0 libc-bin libc6 libcap2 libcgmanager0 libck-connector0 libcomerr2 libdb5.3 libdbus-1-3 libdebconfclient0 libdrm2 libedit2 libevent-2.0-5 libexpat1 libffi6 libgcc1 libgdbm3 libgssapi-krb5-2 libjson-c2 libjson0 libk5crypto3 libkeyutils1 libklibc libkmod2 libkrb5-3 libkrb5support0 liblzma5 libmount1 libmpdec2 libncurses5 libncursesw5 libnih-dbus1 libnih1 libnl-3-200 libnl-genl-3-200 libpam-modules libpam-modules-bin libpam-mount libpam-runtime libpam-systemd libpam0g libpci3 libpcre3 libplymouth2 libpng12-0 libprocps3 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libpython3-stdlib libpython3.4-minimal libpython3.4-stdlib libreadline6 libselinux1 libsemanage-common libsemanage1 libsepol1 libslang2 libsqlite3-0 libss2 libssl1.0.0 libstdc++6 libtinfo5 libudev1 libui-dialog-perl libusb-0.1-4 libusb-1.0-0 libustr-1.0-1 libuuid1 libwrap0 linux-firmware linux-image-3.13.0-24-generic linux-image-extra-3.13.0-24-generic linux-image-generic localepurge locales logcheck logcheck-database login logrotate lsb-base lsb-release lshw lsof lxc lxc-templates make makedev man-db manpages manpages-dev mawk mc md5deep mdadm mercurial mime-support mlocate module-init-tools molly-guard mount mountall mtr-tiny multiarch-support ncurses-base ncurses-bin ndisc6 net-tools netcat-openbsd netsniff-ng nmap openntpd openssh-client openssh-server openssh-sftp-server p7zip-full p7zip-rar passwd pax pciutils perl perl-base perl-modules plymouth postfix procps psmisc pv python python-apt-common python-mako python-mechanize python-minimal python2.7 python2.7-minimal python3 python3-apt python3-minimal python3.4 python3.4-minimal readline-common reprepro resolvconf rsyslog sed sensible-utils sharutils smartmontools subversion sudo sysv-rc sysvinit-utils tar tcpdump tcptraceroute tmux traceroute tree tzdata ubuntu-keyring ucf udev uidmap unattended-upgrades unbound-host unrar unzip upstart usbutils util-linux vim-nox vnstat wget whois wireless-regdb xz-utils zerofree zip zlib1g zsh-doc zsh-static

debootstrap(当然,其中一些来自过程。)


要求提供的信息:

$ uname -a|sed 's/'$(hostname -f)'/foobar/g'
Linux foobar 3.13.0-46-generic #79-Ubuntu SMP Tue Mar 10 20:06:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

我更新到了较新的内核(包裹linux-image-generic-lts-utopic):

$ uname -a|sed 's/'$(hostname -f)'/foobar/g'
Linux foobar 3.16.0-33-generic #44~14.04.1-Ubuntu SMP Fri Mar 13 10:33:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

输出sysctl -a已被匿名化并放置这里

该命令是(减去 1 将sed接口名称替换为_bridge):

sudo sysctl -a|sed 's/'$(hostname -f)'/foobar/g;s/'$(hostname -s)'/foobar/g'|grep -Ev '^net\.ipv[46]\.(neigh|conf)\._[s]'|grep -v nf_log

总体有与所有配置为 IPv4 和 IPv6 的接口类似_bridge,仅 IP 地址不同。但是,它们目前尚未使用。每间客房预计供一名 LXC 客人使用。

# lspci -s 06:00.0 -vv
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
        Subsystem: Micro-Star International Co., Ltd. [MSI] X58 Pro-E
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Interrupt: pin A routed to IRQ 42
        Region 0: I/O ports at e800 [size=256]
        Region 2: Memory at fbeff000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory at f6ff0000 (64-bit, prefetchable) [size=64K]
        [virtual] Expansion ROM at fbe00000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00000  Data: 40c1
        Capabilities: [70] Express (v1) Endpoint, MSI 01
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [b0] MSI-X: Enable- Count=2 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [d0] Vital Product Data
                Unknown small resource type 05, will not decode more.
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
        Kernel driver in use: r8169
# modinfo r8169
filename:       /lib/modules/3.16.0-33-generic/kernel/drivers/net/ethernet/realtek/r8169.ko
firmware:       rtl_nic/rtl8168g-3.fw
firmware:       rtl_nic/rtl8168g-2.fw
firmware:       rtl_nic/rtl8106e-2.fw
firmware:       rtl_nic/rtl8106e-1.fw
firmware:       rtl_nic/rtl8411-2.fw
firmware:       rtl_nic/rtl8411-1.fw
firmware:       rtl_nic/rtl8402-1.fw
firmware:       rtl_nic/rtl8168f-2.fw
firmware:       rtl_nic/rtl8168f-1.fw
firmware:       rtl_nic/rtl8105e-1.fw
firmware:       rtl_nic/rtl8168e-3.fw
firmware:       rtl_nic/rtl8168e-2.fw
firmware:       rtl_nic/rtl8168e-1.fw
firmware:       rtl_nic/rtl8168d-2.fw
firmware:       rtl_nic/rtl8168d-1.fw
version:        2.3LK-NAPI
license:        GPL
description:    RealTek RTL-8169 Gigabit Ethernet driver
author:         Realtek and the Linux r8169 crew <[email protected]>
srcversion:     D0E1934D763B6927E0CB4A4
alias:          pci:v00000001d00008168sv*sd00002410bc*sc*i*
alias:          pci:v00001737d00001032sv*sd00000024bc*sc*i*
alias:          pci:v000016ECd00000116sv*sd*bc*sc*i*
alias:          pci:v00001259d0000C107sv*sd*bc*sc*i*
alias:          pci:v00001186d00004302sv*sd*bc*sc*i*
alias:          pci:v00001186d00004300sv*sd*bc*sc*i*
alias:          pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
alias:          pci:v000010ECd00008169sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008167sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008136sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008129sv*sd*bc*sc*i*
depends:        mii
intree:         Y
vermagic:       3.16.0-33-generic SMP mod_unload modversions
signer:         Magrathea: Glacier signing key
sig_key:        25:26:EE:FE:32:C9:58:B4:CD:85:CA:5F:BF:EB:ED:A1:75:D1:B2:18
sig_hashalgo:   sha512
parm:           use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int)
parm:           debug:Debug verbosity level (0=none, ..., 16=all) (int)

答案1

您肯定遇到了网络堆栈问题,因此我确实建议您使用冗长但经过验证的路线图来解决此问题。我自己在类似的情况下使用过它,甚至在真实的硬件上也发生过。安装 Ncurses 并与编译器一起构建必要的开发包。首先,制作 Linux 内核的 git 快照,并在 git 快照目录中执行此操作:

git checkout-index -a -f --prefix=/usr/src/linux-build/ <--- trailing slash is MUST-HAVE!
cd /usr/src/linux-build
cp /boot/config-\`uname -r\` .config
make menuconfig

检查您需要的所有 IP 选项并禁用 IPv6。之后构建并安装您的内核。其次,在你的/etc/sysctl.conf

net.ipv6.conf.default.autoconf=0
net.ipv6.conf.default.accept_dad=0
net.ipv6.conf.default.accept_ra=0
net.ipv6.conf.default.accept_ra_defrtr=0
net.ipv6.conf.default.accept_ra_rtr_pref=0
net.ipv6.conf.default.accept_ra_pinfo=0
net.ipv6.conf.default.accept_source_route=0
net.ipv6.conf.default.accept_redirects=0
net.ipv6.conf.default.forwarding=0
net.ipv6.conf.all.autoconf=0
net.ipv6.conf.all.accept_dad=0
net.ipv6.conf.all.accept_ra=0
net.ipv6.conf.all.accept_ra_defrtr=0
net.ipv6.conf.all.accept_ra_rtr_pref=0
net.ipv6.conf.all.accept_ra_pinfo=0
net.ipv6.conf.all.accept_source_route=0
net.ipv6.conf.all.accept_redirects=0
net.ipv6.conf.all.forwarding=0
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1

然后重新启动,并在 /etc/ssh/ssh 中检查这一点d_配置:

KeyRegenerationInterval 3600
ServerKeyBits 768
Compression yes <---- PAY ATTENTION TO THIS : see below

压缩指令必须设置为“yes”或“no”,默认情况下它是“delayed”——这是一个引发问题的值。

相关内容