我在 Ubuntu 17.04 上运行,openssh-client==7.4p1-10
内核4.10.0-33-generic
。
我在执行 ssh 命令时遇到问题,例如:
rsync -t -e ssh -p 22 script.sh [email protected]:/var/lib/script.sh
\_ ssh -p 22 -l root [email protected] rsync --server -te.LsfxC . /var/lib/script.sh
同步该 4kB 的脚本需要rsync
6 分钟。问题不仅仅在rsync
于此git push
,有时 ssh 也会有问题。
有趣的是,中断该过程并再次执行后它会立即起作用:
^Crsync error: unexplained error (code 130) at rsync.c(638) [sender=3.1.2]
rsync: [sender] write error: Broken pipe (32)
这似乎不是 DNS 问题,以下是/etc/resolv.conf
:
nameserver 8.8.8.8
nameserver 8.8.4.4
options single-request-reopen
options attempts:2
options rotate
options timeout:2
我已经禁用 GSSAPI:
/etc/ssh/ssh_config
:
GSSAPIAuthentication no
GSSAPIDelegateCredentials no
没有任何效果,我尝试强制 IPv4 连接,但-4
也没有成功。知道可能是什么问题吗?
以下是该过程的 strace:
strace: Process 7610 attached
select(8, [3 5], [], NULL, NULL) = 1 (in [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 893598449}) = 0
read(3, "\372oyu\331J\20\327\264\325\357\274\vn\233\nG\207\207c\251\230\341NzUk\261\351v\23\353"..., 8192) = 44
clock_gettime(CLOCK_BOOTTIME, {42870, 894108136}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 894258960}) = 0
select(8, [3 5], [6], NULL, NULL) = 1 (out [6])
clock_gettime(CLOCK_BOOTTIME, {42870, 894325845}) = 0
write(6, "\3\0\0\7\0\0\0", 7) = 7
clock_gettime(CLOCK_BOOTTIME, {42870, 894439661}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 894473071}) = 0
select(8, [3 5], [], NULL, NULL) = 1 (in [5])
clock_gettime(CLOCK_BOOTTIME, {42870, 894558087}) = 0
read(5, "\2\0\0\7\0\0\1\0\0\7\0", 16384) = 11
clock_gettime(CLOCK_BOOTTIME, {42870, 894661575}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 894699595}) = 0
select(8, [3 5], [3], NULL, NULL) = 1 (out [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 894780961}) = 0
write(3, "\f\16\6UF|B\1\315\nYP\355\f|\177|\234v\371\322\236*)\32`\3214\225$u\337"..., 52) = 52
clock_gettime(CLOCK_BOOTTIME, {42870, 894852781}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 894874370}) = 0
select(8, [3 5], [], NULL, NULL) = 1 (in [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 923152465}) = 0
read(3, "\310\3258\332\212)\re\262\322^\f\275\324X{\361\23f\211mk'\213\224\v\0\204\322\n\25\221"..., 8192) = 44
clock_gettime(CLOCK_BOOTTIME, {42870, 923618233}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 923845130}) = 0
select(8, [3 5], [6], NULL, NULL) = 1 (out [6])
clock_gettime(CLOCK_BOOTTIME, {42870, 923946992}) = 0
write(6, "\1\0\0\7\0", 5) = 5
clock_gettime(CLOCK_BOOTTIME, {42870, 924002335}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 924027449}) = 0
select(8, [3 5], [], NULL, NULL) = 1 (in [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 943180384}) = 0
read(3, "\326U\32\20\246\374\201K\246\177!z\265\302^\252\371\255\215\355\265\356\313\322W\2341`%\215\20P"..., 8192) = 176
close(6) = 0
close(5) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 943307191}) = 0
clock_gettime(CLOCK_BOOTTIME, {42870, 943334146}) = 0
close(7) = 0
select(8, [3], [3], NULL, NULL) = 1 (out [3])
clock_gettime(CLOCK_BOOTTIME, {42870, 943414987}) = 0
write(3, "0\236\27\233p\303\324\302\222mD\242Y_\34S\365\366p\214z\320\367.sN\252\337\322S\202("..., 36) = 36
rt_sigaction(SIGWINCH, NULL, {0x5639600b7460, [], SA_RESTORER, 0x7f7046de37f0}, 8) = 0
rt_sigaction(SIGWINCH, {SIG_DFL, [], SA_RESTORER, 0x7f7046de37f0}, NULL, 8) = 0
write(3, "F\226\207\7\243\207\33\316\37\1U$\326Y\314\253\310p\210\354\240\247\322n\32\272A\312\312:\252\324"..., 60) = 60
ioctl(0, TCGETS, 0x7ffc20de6720) = -1 ENOTTY (Inappropriate ioctl for device)
fcntl(0, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(0, F_SETFL, O_RDWR) = 0
ioctl(1, TCGETS, 0x7ffc20de6720) = -1 ENOTTY (Inappropriate ioctl for device)
fcntl(1, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(1, F_SETFL, O_RDWR) = 0
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
shutdown(3, SHUT_RDWR) = 0
close(3) = 0
exit_group(0) = ?
+++ exited with 0 +++
我注意到的另一件事是重新传输的次数相对较高(启动系统后几分钟内) - 同一网络中的其他设备运行正常。网卡出现故障?
$ netstat -s | egrep -i 'loss|retran'
421 segments retransmitted
TCPLostRetransmit: 6
1 timeouts in loss state
47 fast retransmits
137 retransmits in slow start
TCPLossProbes: 7
TCPRetransFail: 3
TCPSynRetrans: 12
编辑:
我已经尝试过但没有成功:
- 更换网线(直接连接到路由器)
- 更换 NIC 卡(板载 Broadcom Realtek 千兆卡)
答案1
您可以通过尝试简单的ssh -vvv
服务器并查看来自客户端进程的消息来获得更多的调试信息。
还可以尝试 telnet 到 ssh 端口(默认为 22)并查看其响应速度。
正如其他人所说,这可能是防火墙问题(似乎是对传入连接的限制),但是,由于您已禁用它并且它没有太大帮助,所以这次可能不是这种情况。
另一个选项是用户/组信息,它会将连接保持一段时间,例如,当连接到使用远程 LDAP 服务器的计算机并且该计算机很忙或无法访问 LDAP(需要解析您的 uid/gid)时,它也会延迟连接。(如果可能的话,尝试使用 ssh 密钥登录 root 帐户,因为它不应该使用外部服务器)
还要检查的另一件事是远程端的 DNS 服务器,ssh 服务器可能会尝试将您的 IP 地址解析为 DNS 主机,如果其 DNS 服务器不可靠,则可能还需要一些时间才能完成。
至于第一个连接之后的连接速度更快,这也可能表明问题出在某种缓存机制上(DNS、LDAP、netfilter RELATED、ESTABLISHED 状态),或者只是你的 ssh 客户端使用了控制套接字(并且在初始连接后保持它们打开)
答案2
经过几次失败的尝试后,我调整了网络相关/etc/sysctl.conf
参数关注价值观:
net.core.netdev_max_backlog = 5000
# allow testing with buffers up to 64MB
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
# increase Linux autotuning TCP buffer limit to 32MB
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
# recommended default congestion control is htcp
net.ipv4.tcp_congestion_control=htcp
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1
net.core.default_qdisc = fq
仅增加 TCP 缓冲区没有帮助。现在网络运行正常。