我正在使用以下技巧这个 SO线。
ps -awux|grep df
root 15826 0.0 0.0 0 0 ? I< May22 0:00 [cifs-dfscache]
myuser 3086246 0.0 0.0 216860 3212 ? Ss 16:06 0:02 bash -c while [ -d /proc/$PPID ]; do sleep 1;head -v -n 8 /proc/meminfo; head -v -n 2 /proc/stat /proc/version /proc/uptime /proc/loadavg /proc/sys/fs/file-nr /proc/sys/kernel/hostname; tail -v -n 16 /proc/net/dev;echo '==> /proc/df <==';df;echo '==> /proc/who <==';who;echo '==> /proc/end <==';echo '##Moba##'; done
myuser 3137650 0.0 0.0 215348 616 ? D 16:27 0:00 df
因此我查看了cat
进程 3137650 的堆栈:
cat /proc/3137650/stack
[<0>] autofs_wait+0x25b/0x723
[<0>] autofs_mount_wait+0x49/0xf0
[<0>] autofs_d_automount+0xdb/0x200
[<0>] follow_managed+0x110/0x2c0
[<0>] walk_component+0x1e9/0x2f0
[<0>] path_lookupat+0x70/0x120
[<0>] filename_lookup+0x97/0x180
[<0>] user_statfs+0x33/0xa0
[<0>] __do_sys_statfs+0x10/0x30
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
另一个进程 ID 为 3086246:
cat /proc/3086246/stack
[<0>] do_wait+0x1b3/0x220
[<0>] kernel_wait4+0x96/0x120
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
然后:
xargs -0 -n 1 echo < /proc/3137650/environ
SHELL=/bin/bash
MATHEMATICA_HOME=/usr/local/Wolfram/Mathematica/11.3
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
XDG_CONFIG_HOME=/home/myuser/.config
SPARK_LOCAL_HOSTNAME=localhost
LMOD_DIR=/usr/share/lmod/lmod/libexec
PWD=/home/myuser
LOGNAME=myuser
XDG_SESSION_TYPE=tty
MODULESHOME=/usr/share/lmod/lmod
MANPATH=/usr/share/lmod/lmod/share/man:
CUDA_INCLUDE_DIRS=/usr/include/cuda
SPARK_MASTER_IP=127.0.0.1
HOME=/home/myuser
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
LANG=en_US.UTF-8
XDG_CONFIG_DIR=/home/myuser/.config
LMOD_SETTARG_FULL_SUPPORT=no
CUDA_INC_PATH=/usr/include/cuda
LMOD_VERSION=8.2.10
SSH_CONNECTION=x.x.x.x 51696 x.x.x.x 22
MODULEPATH_ROOT=/usr/share/modulefiles
XDG_SESSION_CLASS=user
LMOD_PKG=/usr/share/lmod/lmod
HADOOP_HOME=/usr/local/bin/hadoop-2.9.0
GUROBI_HOME=/home/student/gurobi811/linux64/
LESSOPEN=||/usr/bin/lesspipe.sh %s
USER=kudyba
LMOD_ROOT=/usr/share/lmod
SHLVL=1
BASH_ENV=/usr/share/lmod/lmod/init/bash
LMOD_sys=Linux
SPARK_HOME=/usr/local/bin/spark
SPARK_LOCAL_IP=127.0.0.1
XDG_SESSION_ID=7778
LD_LIBRARY_PATH=:/home/student/gurobi811/linux64//lib
XDG_RUNTIME_DIR=/run/user/6105
SSH_CLIENT=x.x.x.x 51696 22
PIG_INSTALL=/usr/local/bin/pig-0.17.0
SPARK_EXAMPLES_JAR=/usr/local/bin/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.1.jar
KDEDIRS=/usr
XDG_DATA_DIRS=/home/myuser/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
PATH=/usr/local/bin/anaconda3/bin:/home/users/mzilversmit/ncbi-blast-2.7.1+/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/users/students/mchen177/gurobi811/linux64//bin:/usr/local/bin/spark/bin:/usr/local/bin/pig-0.17.0/bin:/opt/dell/srvadmin/bin:/usr/local/bin/spark/bin
MODULEPATH=/etc/modulefiles:/usr/share/modulefiles:/usr/share/modulefiles/Linux:/usr/share/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core
SPARK_CLASSPATH=/usr/share/java/mysql-connector-java.jar
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/6105/bus
LMOD_CMD=/usr/share/lmod/lmod/libexec/lmod
BASH_FUNC_ml%%=() { eval $($LMOD_DIR/ml_cmd "$@")
}
BASH_FUNC_module%%=() { eval $($LMOD_CMD bash "$@") && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
_=/usr/bin/df
另一个过程如下:
xargs -0 -n 1 echo < /proc/3086246/environ
USER=myuser
LOGNAME=myuser
HOME=/home/myuser
PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
SHELL=/bin/bash
XDG_SESSION_ID=7778
XDG_RUNTIME_DIR=/run/user/xxxx
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/xxxx/bus
XDG_SESSION_TYPE=tty
XDG_SESSION_CLASS=user
SSH_CLIENT=edited 51696 22
SSH_CONNECTION=edited 51696 edited 22
只是为了确认父进程 ID:
ps --ppid 3086246
PID TTY TIME CMD
3137650 ? 00:00:00 df
不确定这是否echo ##Moba##
有线索,因为我正在使用 Mobaxterm。
当我尝试strace
父进程时,它只是等待:
strace -p 3086246
strace: Process 3086246 attached
wait4(-1,
然而子进程返回:
strace -p 3137650
strace: attach: ptrace(PTRACE_SEIZE, 3137650): Operation not permitted
为了更好的衡量:
pstree -pl 3137650
df(3137650)
和:
pstree -pl 3086246
bash(3086246)───df(3137650)
和:
ps -fp 3086246
UID PID PPID C STIME TTY TIME CMD
myuser 3086246 1 0 16:06 ? 00:00:02 bash -c while [ -d /proc/$PPID ]; do sleep 1;head -v -n 8 /proc/meminfo; head
所以TTY
是' ?
',我没能从 cronjobs 中看到任何东西。
Ss
对 3086246 和D
3137650执行 2 个不同的状态,这里有漂亮的桌子:
D uninterruptible sleep (usually IO)
S interruptible sleep (waiting for an event to complete)
s is a session leader
gdb
我也在3086246 PID 上尝试了:
Reading symbols from /usr/bin/bash...
Reading symbols from /usr/lib/debug/usr/bin/bash-5.0.17-1.fc32.x86_64.debug...
Reading symbols from /lib64/libtinfo.so.6...
Reading symbols from /usr/lib/debug/usr/lib64/libtinfo.so.6.1-6.1-15.20191109.fc32.x86_64.debug...
Reading symbols from /lib64/libdl.so.2...
Reading symbols from /usr/lib/debug/usr/lib64/libdl-2.31.so.debug...
Reading symbols from /lib64/libc.so.6...
Reading symbols from /usr/lib/debug/usr/lib64/libc-2.31.so.debug...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/usr/lib64/ld-2.31.so.debug...
0x00007f8e059ccf3a in __GI___wait4 (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffa494ff10, options=options@entry=0,
usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
27 return SYSCALL_CANCEL (wait4, pid, stat_loc, options, usage);
有什么想法或其他调试命令可以尝试吗?
编辑:根据@michael-hampton 的建议添加了 PID 的 systemctl 状态:
systemctl status 3086246 -l --no-pager
● session-7778.scope - Session 7778 of user myuser
Loaded: loaded (/run/systemd/transient/session-7778.scope; transient)
Transient: yes
Active: active (abandoned) since Mon 2020-06-29 13:03:32 EDT; 11h ago
Tasks: 2
Memory: 316.9M
CPU: 11min 46.751s
CGroup: /user.slice/user-6105.slice/session-7778.scope
├─3086246 bash -c while [ -d /proc/$PPID ]; do sleep 1;head -v -n 8 /proc/meminfo; head -v -n 2 /proc/stat /proc/version /proc/uptime /proc/loadavg /proc/sys/fs/file-nr /proc/sys/kernel/hostname; tail -v -n 16 /proc/net/dev;echo '==> /proc/df <==';df;echo '==> /proc/who <==';who;echo '==> /proc/end <==';echo '##Moba##'; done
└─3137650 df
Jun 29 16:26:21 ourserver dracut[3126800]: lrwxrwxrwx 1 root root 20 May 29 14:35 usr/share/unimaps -> /usr/lib/kbd/unimaps
Jun 29 16:26:21 ourserver dracut[3126800]: drwxr-xr-x 3 root root 0 May 29 14:35 var
Jun 29 16:26:21 ourserver dracut[3126800]: lrwxrwxrwx 1 root root 11 May 29 14:35 var/lock -> ../run/lock
Jun 29 16:26:21 ourserver dracut[3126800]: lrwxrwxrwx 1 root root 6 May 29 14:35 var/run -> ../run
Jun 29 16:26:21 ourserver dracut[3126800]: drwxr-xr-x 2 root root 0 May 29 14:35 var/tmp
Jun 29 16:26:21 ourserver dracut[3126800]: ========================================================================
Jun 29 16:26:21 ourserver dracut[3126800]: *** Creating initramfs image file '/boot/initramfs-5.6.19-300.fc32.x86_64.tmp' done ***
Jun 29 16:27:21 ourserver systemd-tmpfiles[3137785]: /usr/lib/tmpfiles.d/lxdm.conf:1: Line references path below legacy directory /var/run/, updating /var/run/lxdm → /run/lxdm; please update the tmpfiles.d/ drop-in file accordingly.
Jun 29 16:48:05 ourserver su[2983955]: pam_unix(su:session): session closed for user root
Jun 29 16:48:07 ourserver sshd[2983731]: pam_unix(sshd:session): session closed for user myuser
和:
systemctl status 3137650 -l --no-pager
● session-7778.scope - Session 7778 of user myuser
Loaded: loaded (/run/systemd/transient/session-7778.scope; transient)
Transient: yes
Active: active (abandoned) since Mon 2020-06-29 13:03:32 EDT; 11h ago
Tasks: 2
Memory: 316.9M
CPU: 11min 46.751s
CGroup: /user.slice/user-6105.slice/session-7778.scope
├─3086246 bash -c while [ -d /proc/$PPID ]; do sleep 1;head -v -n 8 /proc/meminfo; head -v -n 2 /proc/stat /proc/version /proc/uptime /proc/loadavg /proc/sys/fs/file-nr /proc/sys/kernel/hostname; tail -v -n 16 /proc/net/dev;echo '==> /proc/df <==';df;echo '==> /proc/who <==';who;echo '==> /proc/end <==';echo '##Moba##'; done
└─3137650 df
Jun 29 16:26:21 ourserver dracut[3126800]: lrwxrwxrwx 1 root root 20 May 29 14:35 usr/share/unimaps -> /usr/lib/kbd/unimaps
Jun 29 16:26:21 ourserver dracut[3126800]: drwxr-xr-x 3 root root 0 May 29 14:35 var
Jun 29 16:26:21 ourserver dracut[3126800]: lrwxrwxrwx 1 root root 11 May 29 14:35 var/lock -> ../run/lock
Jun 29 16:26:21 ourserver dracut[3126800]: lrwxrwxrwx 1 root root 6 May 29 14:35 var/run -> ../run
Jun 29 16:26:21 ourserver dracut[3126800]: drwxr-xr-x 2 root root 0 May 29 14:35 var/tmp
Jun 29 16:26:21 ourserver dracut[3126800]: ========================================================================
Jun 29 16:26:21 ourserver dracut[3126800]: *** Creating initramfs image file '/boot/initramfs-5.6.19-300.fc32.x86_64.tmp' done ***
Jun 29 16:27:21 ourserver systemd-tmpfiles[3137785]: /usr/lib/tmpfiles.d/lxdm.conf:1: Line references path below legacy directory /var/run/, updating /var/run/lxdm → /run/lxdm; please update the tmpfiles.d/ drop-in file accordingly.
Jun 29 16:48:05 ourserver su[2983955]: pam_unix(su:session): session closed for user root
Jun 29 16:48:07 ourserver sshd[2983731]: pam_unix(sshd:session): session closed for user myuser
我开始认为这可能是这个错误除了 ssh 不慢之外。运行:
systemctl | grep "abandoned" | grep -e "-[[:digit:]]"
返回了几个被放弃的 ssh 会话。正在运行:
systemctl | grep "abandoned" | grep -e "-[[:digit:]]" | sed "s/.scope.*/.scope/" | xargs systemctl stop
删除所有已放弃的会话并且“ df
”命令将从“ ”中消失ps
。