第一次 ssh 连接需要几分钟

第一次 ssh 连接需要几分钟

我在 Ubuntu 17.04 上运行,openssh-client==7.4p1-10内核4.10.0-33-generic

我在执行 ssh 命令时遇到问题,例如:

rsync -t -e ssh -p 22 script.sh [email protected]:/var/lib/script.sh
\_ ssh -p 22 -l root [email protected] rsync --server -te.LsfxC . /var/lib/script.sh

同步该 4kB 的脚本需要rsync6 分钟。问题不仅仅在rsync于此git push,有时 ssh 也会有问题。

有趣的是,中断该过程并再次执行后它会立即起作用:

^Crsync error: unexplained error (code 130) at rsync.c(638) [sender=3.1.2]
rsync: [sender] write error: Broken pipe (32)

这似乎不是 DNS 问题,以下是/etc/resolv.conf

nameserver 8.8.8.8
nameserver 8.8.4.4
options single-request-reopen
options attempts:2
options rotate
options timeout:2

我已经禁用 GSSAPI:

/etc/ssh/ssh_config

   GSSAPIAuthentication no
   GSSAPIDelegateCredentials no

没有任何效果,我尝试强制 IPv4 连接,但-4也没有成功。知道可能是什么问题吗?

以下是该过程的 strace:

strace: Process 7610 attached
select(8, [3 5], [], NULL, NULL)        = 1 (in [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 893598449}) = 0
read(3, "\372oyu\331J\20\327\264\325\357\274\vn\233\nG\207\207c\251\230\341NzUk\261\351v\23\353"..., 8192) = 44
clock_gettime(CLOCK_BOOTTIME, {42870, 894108136}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 894258960}) = 0
select(8, [3 5], [6], NULL, NULL)       = 1 (out [6])
clock_gettime(CLOCK_BOOTTIME, {42870, 894325845}) = 0
write(6, "\3\0\0\7\0\0\0", 7)           = 7
clock_gettime(CLOCK_BOOTTIME, {42870, 894439661}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 894473071}) = 0
select(8, [3 5], [], NULL, NULL)        = 1 (in [5])
clock_gettime(CLOCK_BOOTTIME, {42870, 894558087}) = 0
read(5, "\2\0\0\7\0\0\1\0\0\7\0", 16384) = 11
clock_gettime(CLOCK_BOOTTIME, {42870, 894661575}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 894699595}) = 0
select(8, [3 5], [3], NULL, NULL)       = 1 (out [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 894780961}) = 0
write(3, "\f\16\6UF|B\1\315\nYP\355\f|\177|\234v\371\322\236*)\32`\3214\225$u\337"..., 52) = 52
clock_gettime(CLOCK_BOOTTIME, {42870, 894852781}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 894874370}) = 0
select(8, [3 5], [], NULL, NULL)        = 1 (in [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 923152465}) = 0
read(3, "\310\3258\332\212)\re\262\322^\f\275\324X{\361\23f\211mk'\213\224\v\0\204\322\n\25\221"..., 8192) = 44
clock_gettime(CLOCK_BOOTTIME, {42870, 923618233}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 923845130}) = 0
select(8, [3 5], [6], NULL, NULL)       = 1 (out [6])
clock_gettime(CLOCK_BOOTTIME, {42870, 923946992}) = 0
write(6, "\1\0\0\7\0", 5)               = 5
clock_gettime(CLOCK_BOOTTIME, {42870, 924002335}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 924027449}) = 0
select(8, [3 5], [], NULL, NULL)        = 1 (in [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 943180384}) = 0
read(3, "\326U\32\20\246\374\201K\246\177!z\265\302^\252\371\255\215\355\265\356\313\322W\2341`%\215\20P"..., 8192) = 176
close(6)                                = 0
close(5)                                = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 943307191}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 943334146}) = 0
close(7)                                = 0
select(8, [3], [3], NULL, NULL)         = 1 (out [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 943414987}) = 0
write(3, "0\236\27\233p\303\324\302\222mD\242Y_\34S\365\366p\214z\320\367.sN\252\337\322S\202("..., 36) = 36
rt_sigaction(SIGWINCH, NULL, {0x5639600b7460, [], SA_RESTORER, 0x7f7046de37f0}, 8) = 0
rt_sigaction(SIGWINCH, {SIG_DFL, [], SA_RESTORER, 0x7f7046de37f0}, NULL, 8) = 0
write(3, "F\226\207\7\243\207\33\316\37\1U$\326Y\314\253\310p\210\354\240\247\322n\32\272A\312\312:\252\324"..., 60) = 60
ioctl(0, TCGETS, 0x7ffc20de6720)        = -1 ENOTTY (Inappropriate ioctl for device)
fcntl(0, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(0, F_SETFL, O_RDWR)               = 0
ioctl(1, TCGETS, 0x7ffc20de6720)        = -1 ENOTTY (Inappropriate ioctl for device)
fcntl(1, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(1, F_SETFL, O_RDWR)               = 0
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
shutdown(3, SHUT_RDWR)                  = 0
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

我注意到的另一件事是重新传输的次数相对较高(启动系统后几分钟内) - 同一网络中的其他设备运行正常。网卡出现故障?

$ netstat -s | egrep -i 'loss|retran'
    421 segments retransmitted
    TCPLostRetransmit: 6
    1 timeouts in loss state
    47 fast retransmits
    137 retransmits in slow start
    TCPLossProbes: 7
    TCPRetransFail: 3
    TCPSynRetrans: 12

编辑

我已经尝试过但没有成功:

  • 更换网线(直接连接到路由器)
  • 更换 NIC 卡(板载 Broadcom Realtek 千兆卡)

答案1

您可以通过尝试简单的ssh -vvv服务器并查看来自客户端进程的消息来获得更多的调试信息。

还可以尝试 telnet 到 ssh 端口(默认为 22)并查看其响应速度。

正如其他人所说,这可能是防火墙问题(似乎是对传入连接的限制),但是,由于您已禁用它并且它没有太大帮助,所以这次可能不是这种情况。

另一个选项是用户/组信息,它会将连接保持一段时间,例如,当连接到使用远程 LDAP 服务器的计算机并且该计算机很忙或无法访问 LDAP(需要解析您的 uid/gid)时,它也会延迟连接。(如果可能的话,尝试使用 ssh 密钥登录 root 帐户,因为它不应该使用外部服务器)

还要检查的另一件事是远程端的 DNS 服务器,ssh 服务器可能会尝试将您的 IP 地址解析为 DNS 主机,如果其 DNS 服务器不可靠,则可能还需要一些时间才能完成。

至于第一个连接之后的连接速度更快,这也可能表明问题出在某种缓存机制上(DNS、LDAP、netfilter RELATED、ESTABLISHED 状态),或者只是你的 ssh 客户端使用了控制套接字(并且在初始连接后保持它们打开)

答案2

经过几次失败的尝试后,我调整了网络相关/etc/sysctl.conf参数关注价值观

net.core.netdev_max_backlog = 5000
# allow testing with buffers up to 64MB
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
# increase Linux autotuning TCP buffer limit to 32MB
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
# recommended default congestion control is htcp
net.ipv4.tcp_congestion_control=htcp
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1
net.core.default_qdisc = fq

仅增加 TCP 缓冲区没有帮助。现在网络运行正常。

相关内容