我相信这是由于rsync
每 15 分钟运行一次的 cronjob 造成的。这是在 ESXi 中运行的 RHEL 6 机器。 /proc/interrupts
显示:
18: 3386804969 IO-APIC-fasteoi eth0
系统负载有时会飙升至 30.00 以上。这是一个单核系统。
该命令sar
显示当时的大部分负载是“%system”。我想确定为什么负载会如此高,以及是否确实是由于rsync
.解决问题的想法?这可能是由于rsync
执行校验和造成的吗?是否rsync
利用 TCP 卸载来执行校验和?
答案1
似乎是一个相当容易解决的问题,我只需运行top
或htop
,看看在您看到问题出现的 15 分钟窗口之间的边界之一期间哪个进程正在消耗资源。
您还可以使用类似的工具nethogs
来确定哪个进程消耗了最多的网络资源。
例子
监控我的无线网卡。
$ sudo nethogs wlp3s0
屏幕截图 - 单击查看大图
调试此问题的其他想法
- 我还会看一下
rsync
交互模式与将其作为 cronjob 进行调试。无论是交互的还是定时的,都应该出现相同的性能下降。 看一下磁盘 I/O。为此,您可以使用该工具
iotop
。$ sudo iotop
识别瓶颈
一般来说,由于您看到 CPU 负载很高,这意味着您有很多“准备运行”的进程正在内核的等待队列中堆积,等待 CPU 的一段时间。
这会让我同意你的观点,即执行校验和计算的操作导致了这种情况。
如果问题更多是网络问题,您可以rsync
使用此开关进行限制:
--bwlimit=KBPS limit I/O bandwidth; KBytes per second
校验和问题?
您还需要确认您的rsync
cronjob 实际上正在使用 rsync 的校验和功能。据我所知,这通常是默认关闭的 - 并且您必须明确启用它,因此这甚至可能不是问题的根本原因。
摘自 rsync 手册页
-c, --checksum
This changes the way rsync checks if the files have been changed
and are in need of a transfer. Without this option, rsync uses a
"quick check" that (by default) checks if each file’s size and time
of last modification match between the sender and receiver. This
option changes this to compare a 128-bit checksum for each file
that has a matching size. Generating the checksums means that both
sides will expend a lot of disk I/O reading all the data in the files
in the transfer (and this is prior to any reading that will be done
to transfer changed files), so this can slow things down
significantly.
The sending side generates its checksums while it is doing the
file-system scan that builds the list of the available files. The
receiver generates its checksums when it is scanning for changed
files, and will checksum any file that has the same size as the
corresponding sender’s file: files with either a changed size or a
changed checksum are selected for transfer.
Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking a whole-
file checksum that is generated as the file is transferred, but that
automatic after-the-transfer verification has nothing to do with this
option’s before-the-transfer "Does this file need to be updated?"
check.
For protocol 30 and beyond (first supported in 3.0.0), the
checksum used is MD5. For older protocols, the checksum used is MD4.