在我的一台 ubuntu 服务器上,top
命令启动非常慢,当我top
在终端上运行命令时,它将在 10 秒以上后显示信息。
然后我用它strace -yy top
来分析问题,我发现top
挂在以下位置:
getsockopt(6<UNIX:[366379599]>, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0 [0/1911]
setsockopt(6<UNIX:[366379599]>, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = 0
connect(6<UNIX:[366379599]>, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 29) = 0
getsockopt(6<UNIX:[366379599->366381191]>, SOL_SOCKET, SO_PEERCRED, {pid=1, uid=0, gid=0}, [12]) = 0
getsockopt(6<UNIX:[366379599->366381191]>, SOL_SOCKET, SO_PEERSEC, "unconfined", [64->10]) = 0
getsockopt(6<UNIX:[366379599->366381191]>, SOL_SOCKET, SO_PEERGROUPS, "", [256->0]) = 0
fstat(6<UNIX:[366379599->366381191]>, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
getsockopt(6<UNIX:[366379599->366381191]>, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
getsockname(6<UNIX:[366379599->366381191]>, {sa_family=AF_UNIX}, [128->2]) = 0
geteuid() = 0
sendmsg(6<UNIX:[366379599->366381191]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0AUTH EXTERNAL ", iov_len=15}, {iov_base="30", iov_len
=2}, {iov_base="\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", iov_len=28}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 45
gettid() = 25127
getrandom("\x00\x3c\x86\xd4\x34\xa0\xb2\x91\xee\x2d\x4b\x2c\x1b\x56\xda\xa8", 16, GRND_NONBLOCK) = 16
futex(0x7f00826d9ff0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
recvmsg(6<UNIX:[366379599->366381191]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="OK 141ec8c80f29bf83dc576f5e61641"..., iov_len=256}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 52
sendmsg(6<UNIX:[366379599->366381191]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/fre"
..., iov_len=128}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 128
recvmsg(6<UNIX:[366379599->366381191]>, {msg_namelen=0}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable
)
ppoll([{fd=6<UNIX:[366379599->366381191]>, events=POLLIN}], 1, {tv_sec=24, tv_nsec=999897000}, NULL, 8) = 1 ([{fd=6, revents=POLLIN}], left {tv_sec=24, tv_nsec=999818992})
recvmsg(6<UNIX:[366379599->366381191]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\17\0\0\0\1\0\0\0E\0\0\0\6\1s\0\n\0\0\0", iov_le
n=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
recvmsg(6<UNIX:[366379599->366381191]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base=":1.2843938\0\0\0\0\0\0\5\1u\0\1\0\0\0\10\1g\0\1s\0\0"..
., iov_len=79}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 79
sendmsg(6<UNIX:[366379599->366381191]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\4\0\0\0\2\0\0\0\247\0\0\0\1\1o\0\31\0\0\0/org/f
re"..., iov_len=184}, {iov_base="w\364\0\0", iov_len=4}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 188
recvmsg(6<UNIX:[366379599->366381191]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\17\0\0\0\2\0\0\0\225\0\0\0\1\1o\0\25\0\0\0", io
v_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
recvmsg(6<UNIX:[366379599->366381191]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/DBus\0\0\0\2\1s\0\24\0\0\0"..., iov_le
n=159}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 159
recvmsg(6<UNIX:[366379599->366381191]>, {msg_namelen=0}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable
)
ppoll([{fd=6<UNIX:[366379599->366381191]>, events=POLLIN}], 1, {tv_sec=24, tv_nsec=999853000}, NULL, 8
该命令ls -l /proc/$(pgrep top)/fd/
得到以下输出:
lrwx------ 1 root root 64 1月 25 18:11 0 -> /dev/pts/9
lrwx------ 1 root root 64 1月 25 18:11 1 -> /dev/pts/9
l-wx------ 1 root root 64 1月 25 18:11 2 -> /dev/null
lrwx------ 1 root root 64 1月 25 18:11 3 -> /dev/pts/9
lr-x------ 1 root root 64 1月 25 18:11 4 -> /proc/uptime
lr-x------ 1 root root 64 1月 25 18:11 5 -> /proc
lrwx------ 1 root root 64 1月 25 18:11 6 -> 'socket:[364876425]'
lsof -d 6 -U -a +E -p $(pgrep top)
得到以下输出:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dbus-daem 874 messagebus 12u unix 0xffff9545f6fee400 0t0 366381191 /var/run/dbus/system_bus_socket type=STREAM ->INO=366379599 25127,top,6u
top 25127 root 6u unix 0xffff9545f6fefc00 0t0 366379599 type=STREAM ->INO=366381191 874,dbus-daem,12u
正常运行后top
,命令ls -l /proc/$(pgrep top)/fd/
得到以下输出:
lrwx------ 1 root root 64 1月 25 18:11 0 -> /dev/pts/9
lrwx------ 1 root root 64 1月 25 18:11 1 -> /dev/pts/9
l-wx------ 1 root root 64 1月 25 18:11 2 -> /dev/null
lrwx------ 1 root root 64 1月 25 18:11 3 -> /dev/pts/9
lr-x------ 1 root root 64 1月 25 18:11 4 -> /proc/uptime
lr-x------ 1 root root 64 1月 25 18:11 5 -> /proc/meminfo
lr-x------ 1 root root 64 1月 25 18:11 6 -> /proc/loadavg
lr-x------ 1 root root 64 1月 25 18:12 7 -> /proc/stat
lsof -d 6 -U -a +E -p $(pgrep top)
什么也不输出。
我想知道命令启动这么慢的原因top
,有人可以帮助我吗?
答案1
当我查看自己的 strace 时,我发现只有当 top 尝试与 nscd ( )的套接字文件进行通信时recvmsg
才会发生这种情况。poll
/var/run/nscd/socket
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 10
connect(10<UNIX:[355032774]>, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = 0
sendto(10<UNIX:[355032774->355027846]>, "\2\0\0\0\v\0\0\0\7\0\0\0passwd\0", 19, MSG_NOSIGNAL, NULL, 0) = 19
poll([{fd=10<UNIX:[355032774]>, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=10, revents=POLLIN|POLLHUP}])
recvmsg(10<UNIX:[355032774]>, {msg_name(0)=NULL, msg_iov(2)=[{"passwd\0", 7}, {"\264K\5\0\0\0\0\0", 8}], msg_controllen=20, [{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, [11</run/nscd/passwd>]}], msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 15
如果您也看到这样的情况,则 nscd 可能有问题。尝试从 nscd 缓存top
获取信息,该进程可能会挂起。passwd
请尝试停止 nscd ( service nscd stop
) 并查看是否可以解决问题。如果确实解决了问题top
,请尝试再次重新启动 nscd ( service nscd start
) 并检查重新启动后 nscd 是否正常工作。