网站杀硬盘I/O，如何预防？

Question 1

如果你的 FreeBSD 版本比较新，则top可以-m选择显示排名靠前的 I/O 使用者（如果你为其提供“ io”参数）：

top -m io

在这种情况下，我还会使用-S选项（显示系统进程，以防其中一个是罪魁祸首）。为了在负载下表现更好，我会使用-q（重新调整优先级以在更高优先级运行）和-u（跳过阅读/etc/passwd，这应该有助于它加载得更快）。

由于运行时间太长top，我会告诉它只显示两次输出（-d 2），然后以批处理模式运行（-b），这样它就会自动退出。

第一次以这种方式运行top，其输出的第一部分将显示一段时间以来（可能是从启动时开始？我对此不太确定）多个进程的累计 I/O 计数。在第一个显示中，您可以看到一段时间内使用量最大的进程是谁。在第二个显示中，您可以看到过去两秒内使用量最大的进程。

因此，将所有内容放在一起并运行，find以便发生一些实际的 I/O：

# top -S -m io -qu -b -d 2 10
last pid: 39560;  load averages:  0.28,  0.19,  0.08  up 6+04:02:29    11:28:28
125 processes: 2 running, 104 sleeping, 19 waiting

Mem: 96M Active, 668M Inact, 122M Wired, 25M Cache, 104M Buf, 17M Free
Swap: 2048M Total, 96K Used, 2048M Free


  PID    UID     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
   11      0        0 81032823      0      0      0      0   0.00% idle: cpu0
39554    105   129857 556534  74894      0      0  74894  13.62% find
39533    105   443603 614796      0      0      0      0   0.00% sshd
   36      0   1793393      0      0      0      0      0   0.00% irq23: vr0
   24      0   2377710   2680      0      0      0      0   0.00% irq20: atapci0
   50      0   533513 3415672     66 345350      0 345416  62.81% syncer
   13      0   78651569   7230      0      0      0      0   0.00% swi4: clock sio
    5      0   1911601  20905      0      0      0      0   0.00% g_down
    4      0   2368511  12100      0      0      0      0   0.00% g_up
   37      0    53308    313      0      0      0      0   0.00% acpi_thermal

last pid: 39560;  load averages:  0.28,  0.19,  0.08  up 6+04:02:31    11:28:30
125 processes: 2 running, 104 sleeping, 19 waiting
CPU:  1.9% user,  0.0% nice,  6.0% system,  2.2% interrupt, 89.9% idle
Mem: 96M Active, 671M Inact, 123M Wired, 25M Cache, 104M Buf, 14M Free
Swap: 2048M Total, 96K Used, 2048M Free

  PID    UID     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
   11      0        0   1115      0      0      0      0   0.00% idle: cpu0
39554    105      606    651    501      0      0    501 100.00% find
39533    105      616    695      0      0      0      0   0.00% sshd
   36      0     1251      0      0      0      0      0   0.00% irq23: vr0
   24      0      501     20      0      0      0      0   0.00% irq20: atapci0
   50      0        2      2      0      0      0      0   0.00% syncer
   13      0      313      3      0      0      0      0   0.00% swi4: clock sio
    5      0      501     26      0      0      0      0   0.00% g_down
    4      0      501      8      0      0      0      0   0.00% g_up
   37      0        0      0      0      0      0      0   0.00% acpi_thermal

一旦您缩小了执行所有 I/O 的进程的范围，您就可以使用truss或devel/strace或sysutils/lsof端口来查看占用大量磁盘空间的进程正在做什么。（当然，如果您的系统非常繁忙，您将无法轻松安装端口）：

例如，查看我的ntpd进程正在使用哪些文件和其他资源：

# lsof -p 890
ntpd        890   root  cwd     VDIR       0,93       1024        2 /
ntpd        890   root  rtd     VDIR       0,93       1024        2 /
ntpd        890   root  txt     VREG       0,98     340940   894988 /usr/sbin/ntpd
ntpd        890   root  txt     VREG       0,93     189184    37058 /libexec/ld-elf.so.1
ntpd        890   root  txt     VREG       0,93      92788    25126 /lib/libm.so.5
ntpd        890   root  txt     VREG       0,93      60060    25130 /lib/libmd.so.4
ntpd        890   root  txt     VREG       0,98      16604   730227 /usr/lib/librt.so.1
ntpd        890   root  txt     VREG       0,93    1423460    25098 /lib/libcrypto.so.5
ntpd        890   root  txt     VREG       0,93    1068216    24811 /lib/libc.so.7
ntpd        890   root    0u    VCHR       0,29        0t0       29 /dev/null
ntpd        890   root    1u    VCHR       0,29        0t0       29 /dev/null
ntpd        890   root    2u    VCHR       0,29        0t0       29 /dev/null
ntpd        890   root    3u    unix 0xc46da680        0t0          ->0xc4595820
ntpd        890   root    5u    PIPE 0xc4465244          0          ->0xc446518c
ntpd        890   root   20u    IPv4 0xc4599190        0t0      UDP *:ntp
ntpd        890   root   21u    IPv6 0xc4599180        0t0      UDP *:ntp
ntpd        890   root   22u    IPv4 0xc4599400        0t0      UDP heffalump.prv.tycho.org:ntp
ntpd        890   root   23u    IPv4 0xc4599220        0t0      UDP ns0.prv.tycho.org:ntp
ntpd        890   root   24u    IPv4 0xc45995c0        0t0      UDP imap.prv.tycho.org:ntp
ntpd        890   root   25u    IPv6 0xc4599530        0t0      UDP [fe80:4::1]:ntp
ntpd        890   root   26u    IPv6 0xc45993b0        0t0      UDP localhost:ntp
ntpd        890   root   27u    IPv4 0xc4599160        0t0      UDP localhost:ntp
ntpd        890   root   28u     rte 0xc42939b0        0t0

...以及它正在进行的系统调用（请注意，这可能会耗费大量资源）：

# truss -p 890
SIGNAL 17 (SIGSTOP)
select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) ERR#4 'Interrupted system call'
SIGNAL 14 (SIGALRM)
sigreturn(0xbfbfea10,0xe,0x10003,0xbfbfea10,0x0,0x806aed0) ERR#4 'Interrupted system call'
select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) ERR#4 'Interrupted system call'
SIGNAL 14 (SIGALRM)
sigreturn(0xbfbfea10,0xe,0x10003,0xbfbfea10,0x0,0x806aed0) ERR#4 'Interrupted system call'
select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) ERR#4 'Interrupted system call'
SIGNAL 14 (SIGALRM)
sigreturn(0xbfbfea10,0xe,0x10003,0xbfbfea10,0x0,0x806aed0) ERR#4 'Interrupted system call'
^C

sysutils/strace与类似truss，但您需要/proc挂载文件系统：

# strace -p 890
strace: open("/proc/...", ...): No such file or directory
trouble opening proc file

# grep ^proc /etc/fstab
proc            /proc                           procfs          rw,noauto       0       0

# mount /proc

# mount | grep /proc
procfs on /proc (procfs, local)

...然后它就会起作用：

# strace -p 890
Process 890 attached - interrupt to quit
--- SIGALRM (Alarm clock: 14) ---
--- SIGALRM (Alarm clock: 14) ---
syscall_417(0xbfbfea10)                 = -1 (errno 4)
select(29, [?], NULL, NULL, NULL)       = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock: 14) ---
--- SIGALRM (Alarm clock: 14) ---
syscall_417(0xbfbfea10)                 = -1 (errno 4)
select(29, [?], NULL, NULL, NULL^C <unfinished ...>
Process 890 detached

祝你好运 - 让我们知道你发现了什么！一旦你确定了流程，我可能会提供进一步的帮助。

编辑：请注意，运行lsof、truss和strace本身可能非常耗时。我做了一些小更新，试图减少它们的影响。此外，如果某个进程正在快速产生许多子进程，您可能必须使用参数告诉truss或strace跟踪子进程-f。

Answer