如何调试进程性能低下

如何调试进程性能低下

我有一个测试应用程序,它将一个字符串写入文件 10000 次。我在两台服务器上运行它,得到了完全不同的结果。

[SYSTEM 1]

$ time ./test_depth 

real    0m0.152s
user    0m0.070s
sys     0m0.082s

    $ strace -cT ./test_depth 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 57.38    0.000420           0     10000           write
 30.19    0.000221           0      9924           fadvise64
 12.43    0.000091           0     10000           gettid
  0.00    0.000000           0        11           read
  0.00    0.000000           0        66        54 open
  0.00    0.000000           0        11           close
  0.00    0.000000           0         7         6 stat
  0.00    0.000000           0        13           fstat
  0.00    0.000000           0         2           lseek
  0.00    0.000000           0        32           mmap
  0.00    0.000000           0        15           mprotect
  0.00    0.000000           0         2           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           uname
  0.00    0.000000           0         3           fcntl
  0.00    0.000000           0         1         1 unlink
  0.00    0.000000           0         1           symlink
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         3         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.000732                 30105        63 total

[SYSTEM 2]

$ time ./test_depth 

real    1m0.252s
user    0m0.163s
sys     0m0.307s

$ strace -cT ./test_depth 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.29    0.147977          15     10000           write
  1.56    0.002355           0      9925           fadvise64
  0.14    0.000217           0     10000           gettid
  0.00    0.000006           1        11           close
  0.00    0.000000           0        11           read
  0.00    0.000000           0        57        45 open
  0.00    0.000000           0         3         3 stat
  0.00    0.000000           0        13           fstat
  0.00    0.000000           0         2           lseek
  0.00    0.000000           0        32           mmap
  0.00    0.000000           0        15           mprotect
  0.00    0.000000           0         2           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           uname
  0.00    0.000000           0         3           fcntl
  0.00    0.000000           0         1           unlink
  0.00    0.000000           0         1           symlink
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         3         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.150555                 30093        50 total

该过程在第二个系统上运行了大约一分钟。查看 strace,每次write调用花费 15 微秒。我该如何继续从这里诊断问题?看起来第二个系统上的磁盘写入时间更长。我如何确定是否真的如此?还有,我还应该查看什么?

两种情况下使用的支架:

1) - /dev/sda1 on / type ext4 (rw)
2) - /dev/mapper/VolGroup-lv_home on /home type ext4 (rw)

顺便说一句,日志是使用gloglib 编写的。如果printf使用 simple,则不会出现此问题。所以这似乎是库的问题。:(

[root@localhost src]# hdparm -I /dev/sda

/dev/sda:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0d 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media
Standards:
    Likely used: 1
Configuration:
    Logical     max current
    cylinders   0   0
    heads       0   0
    sectors/track   0   0
    --
    Logical/Physical Sector size:           512 bytes
    device size with M = 1024*1024:           0 MBytes
    device size with M = 1000*1000:           0 MBytes 
    cache/buffer size  = unknown
Capabilities:
    IORDY not likely
    Cannot perform double-word IO
    R/W multiple sector transfer: not supported
    DMA: not supported
    PIO: pio0 
[root@localhost src]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      51606140   6267948  42716752  13% /
tmpfs                  8141564      1016   8140548   1% /dev/shm
/dev/sda3               495844     37728    432516   9% /boot
/dev/mapper/VolGroup-lv_home
                     1090839440   1022652 1034405284   1% /home


[root@localhost src]# ./lshw -class disk
  *-disk                  
       description: SCSI Disk
       product: PERC H310
       vendor: DELL
       physical id: 2.0.0
       bus info: scsi@0:2.0.0
       logical name: /dev/sda
       version: 2.12
       serial: 00677e1515fc98ef19000d93f7a0a38c
       size: 1117GiB (1199GB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=31c27a66

操作系统:

[SYSTEM 1]
Linux snapoffice 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

[SYSTEM 2]
Linux localhost.localdomain 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

相关内容