跟踪 df 命令的 bash 进程

跟踪 df 命令的 bash 进程

我正在使用以下技巧这个 SO线。

ps -awux|grep df
root       15826  0.0  0.0      0     0 ?        I<   May22   0:00 [cifs-dfscache]
myuser 3086246  0.0  0.0 216860  3212 ?        Ss   16:06   0:02 bash -c while [ -d /proc/$PPID ]; do sleep 1;head -v -n 8 /proc/meminfo; head -v -n 2 /proc/stat /proc/version /proc/uptime /proc/loadavg /proc/sys/fs/file-nr /proc/sys/kernel/hostname; tail -v -n 16 /proc/net/dev;echo '==> /proc/df <==';df;echo '==> /proc/who <==';who;echo '==> /proc/end <==';echo '##Moba##'; done
myuser 3137650  0.0  0.0 215348   616 ?        D    16:27   0:00 df

因此我查看了cat进程 3137650 的堆栈:

cat /proc/3137650/stack
[<0>] autofs_wait+0x25b/0x723
[<0>] autofs_mount_wait+0x49/0xf0
[<0>] autofs_d_automount+0xdb/0x200
[<0>] follow_managed+0x110/0x2c0
[<0>] walk_component+0x1e9/0x2f0
[<0>] path_lookupat+0x70/0x120
[<0>] filename_lookup+0x97/0x180
[<0>] user_statfs+0x33/0xa0
[<0>] __do_sys_statfs+0x10/0x30
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

另一个进程 ID 为 3086246:

cat /proc/3086246/stack
[<0>] do_wait+0x1b3/0x220
[<0>] kernel_wait4+0x96/0x120
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

然后:

xargs -0 -n 1 echo < /proc/3137650/environ
SHELL=/bin/bash
MATHEMATICA_HOME=/usr/local/Wolfram/Mathematica/11.3
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
XDG_CONFIG_HOME=/home/myuser/.config
SPARK_LOCAL_HOSTNAME=localhost
LMOD_DIR=/usr/share/lmod/lmod/libexec
PWD=/home/myuser
LOGNAME=myuser
XDG_SESSION_TYPE=tty
MODULESHOME=/usr/share/lmod/lmod
MANPATH=/usr/share/lmod/lmod/share/man:
CUDA_INCLUDE_DIRS=/usr/include/cuda
SPARK_MASTER_IP=127.0.0.1
HOME=/home/myuser
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
LANG=en_US.UTF-8
XDG_CONFIG_DIR=/home/myuser/.config
LMOD_SETTARG_FULL_SUPPORT=no
CUDA_INC_PATH=/usr/include/cuda
LMOD_VERSION=8.2.10
SSH_CONNECTION=x.x.x.x 51696 x.x.x.x 22
MODULEPATH_ROOT=/usr/share/modulefiles
XDG_SESSION_CLASS=user
LMOD_PKG=/usr/share/lmod/lmod
HADOOP_HOME=/usr/local/bin/hadoop-2.9.0
GUROBI_HOME=/home/student/gurobi811/linux64/
LESSOPEN=||/usr/bin/lesspipe.sh %s
USER=kudyba
LMOD_ROOT=/usr/share/lmod
SHLVL=1
BASH_ENV=/usr/share/lmod/lmod/init/bash
LMOD_sys=Linux
SPARK_HOME=/usr/local/bin/spark
SPARK_LOCAL_IP=127.0.0.1
XDG_SESSION_ID=7778
LD_LIBRARY_PATH=:/home/student/gurobi811/linux64//lib
XDG_RUNTIME_DIR=/run/user/6105
SSH_CLIENT=x.x.x.x 51696 22
PIG_INSTALL=/usr/local/bin/pig-0.17.0
SPARK_EXAMPLES_JAR=/usr/local/bin/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.1.jar
KDEDIRS=/usr
XDG_DATA_DIRS=/home/myuser/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
PATH=/usr/local/bin/anaconda3/bin:/home/users/mzilversmit/ncbi-blast-2.7.1+/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/users/students/mchen177/gurobi811/linux64//bin:/usr/local/bin/spark/bin:/usr/local/bin/pig-0.17.0/bin:/opt/dell/srvadmin/bin:/usr/local/bin/spark/bin
MODULEPATH=/etc/modulefiles:/usr/share/modulefiles:/usr/share/modulefiles/Linux:/usr/share/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core
SPARK_CLASSPATH=/usr/share/java/mysql-connector-java.jar
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/6105/bus
LMOD_CMD=/usr/share/lmod/lmod/libexec/lmod
BASH_FUNC_ml%%=() {  eval $($LMOD_DIR/ml_cmd "$@")
}
BASH_FUNC_module%%=() {  eval $($LMOD_CMD bash "$@") && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
_=/usr/bin/df

另一个过程如下:

xargs -0 -n 1 echo < /proc/3086246/environ
USER=myuser
LOGNAME=myuser
HOME=/home/myuser
PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
SHELL=/bin/bash
XDG_SESSION_ID=7778
XDG_RUNTIME_DIR=/run/user/xxxx
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/xxxx/bus
XDG_SESSION_TYPE=tty
XDG_SESSION_CLASS=user
SSH_CLIENT=edited 51696 22
SSH_CONNECTION=edited 51696 edited 22

只是为了确认父进程 ID:

ps --ppid  3086246
    PID TTY          TIME CMD
3137650 ?        00:00:00 df

不确定这是否echo ##Moba##有线索,因为我正在使用 Mobaxterm。

当我尝试strace父进程时,它只是等待:

strace -p  3086246
strace: Process 3086246 attached
wait4(-1,

然而子进程返回:

strace -p 3137650
strace: attach: ptrace(PTRACE_SEIZE, 3137650): Operation not permitted

为了更好的衡量:

pstree -pl 3137650
df(3137650)

和:

pstree -pl 3086246
bash(3086246)───df(3137650)

和:

ps -fp 3086246
UID          PID    PPID  C STIME TTY          TIME CMD
myuser   3086246       1  0 16:06 ?        00:00:02 bash -c while [ -d /proc/$PPID ]; do sleep 1;head -v -n 8 /proc/meminfo; head

所以TTY是' ?',我没能从 cronjobs 中看到任何东西。

Ss对 3086246 和D3137650执行 2 个不同的状态,这里有漂亮的桌子

D    uninterruptible sleep (usually IO)
S    interruptible sleep (waiting for an event to complete)
s    is a session leader

gdb我也在3086246 PID 上尝试了:

Reading symbols from /usr/bin/bash...
Reading symbols from /usr/lib/debug/usr/bin/bash-5.0.17-1.fc32.x86_64.debug...
Reading symbols from /lib64/libtinfo.so.6...
Reading symbols from /usr/lib/debug/usr/lib64/libtinfo.so.6.1-6.1-15.20191109.fc32.x86_64.debug...
Reading symbols from /lib64/libdl.so.2...
Reading symbols from /usr/lib/debug/usr/lib64/libdl-2.31.so.debug...
Reading symbols from /lib64/libc.so.6...
Reading symbols from /usr/lib/debug/usr/lib64/libc-2.31.so.debug...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/usr/lib64/ld-2.31.so.debug...
0x00007f8e059ccf3a in __GI___wait4 (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffa494ff10, options=options@entry=0,
    usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
27        return SYSCALL_CANCEL (wait4, pid, stat_loc, options, usage);

有什么想法或其他调试命令可以尝试吗?

编辑:根据@michael-hampton 的建议添加了 PID 的 systemctl 状态:

systemctl status 3086246 -l --no-pager
● session-7778.scope - Session 7778 of user myuser
     Loaded: loaded (/run/systemd/transient/session-7778.scope; transient)
  Transient: yes
     Active: active (abandoned) since Mon 2020-06-29 13:03:32 EDT; 11h ago
      Tasks: 2
     Memory: 316.9M
        CPU: 11min 46.751s
     CGroup: /user.slice/user-6105.slice/session-7778.scope
             ├─3086246 bash -c while [ -d /proc/$PPID ]; do sleep 1;head -v -n 8 /proc/meminfo; head -v -n 2 /proc/stat /proc/version /proc/uptime /proc/loadavg /proc/sys/fs/file-nr /proc/sys/kernel/hostname; tail -v -n 16 /proc/net/dev;echo '==> /proc/df <==';df;echo '==> /proc/who <==';who;echo '==> /proc/end <==';echo '##Moba##'; done
             └─3137650 df

Jun 29 16:26:21 ourserver  dracut[3126800]: lrwxrwxrwx   1 root     root           20 May 29 14:35 usr/share/unimaps -> /usr/lib/kbd/unimaps
Jun 29 16:26:21 ourserver  dracut[3126800]: drwxr-xr-x   3 root     root            0 May 29 14:35 var
Jun 29 16:26:21 ourserver  dracut[3126800]: lrwxrwxrwx   1 root     root           11 May 29 14:35 var/lock -> ../run/lock
Jun 29 16:26:21 ourserver  dracut[3126800]: lrwxrwxrwx   1 root     root            6 May 29 14:35 var/run -> ../run
Jun 29 16:26:21 ourserver  dracut[3126800]: drwxr-xr-x   2 root     root            0 May 29 14:35 var/tmp
Jun 29 16:26:21 ourserver  dracut[3126800]: ========================================================================
Jun 29 16:26:21 ourserver  dracut[3126800]: *** Creating initramfs image file '/boot/initramfs-5.6.19-300.fc32.x86_64.tmp' done ***
Jun 29 16:27:21 ourserver  systemd-tmpfiles[3137785]: /usr/lib/tmpfiles.d/lxdm.conf:1: Line references path below legacy directory /var/run/, updating /var/run/lxdm → /run/lxdm; please update the tmpfiles.d/ drop-in file accordingly.
Jun 29 16:48:05 ourserver  su[2983955]: pam_unix(su:session): session closed for user root
Jun 29 16:48:07 ourserver  sshd[2983731]: pam_unix(sshd:session): session closed for user myuser

和:

systemctl status 3137650 -l --no-pager
● session-7778.scope - Session 7778 of user myuser
     Loaded: loaded (/run/systemd/transient/session-7778.scope; transient)
  Transient: yes
     Active: active (abandoned) since Mon 2020-06-29 13:03:32 EDT; 11h ago
      Tasks: 2
     Memory: 316.9M
        CPU: 11min 46.751s
     CGroup: /user.slice/user-6105.slice/session-7778.scope
             ├─3086246 bash -c while [ -d /proc/$PPID ]; do sleep 1;head -v -n 8 /proc/meminfo; head -v -n 2 /proc/stat /proc/version /proc/uptime /proc/loadavg /proc/sys/fs/file-nr /proc/sys/kernel/hostname; tail -v -n 16 /proc/net/dev;echo '==> /proc/df <==';df;echo '==> /proc/who <==';who;echo '==> /proc/end <==';echo '##Moba##'; done
             └─3137650 df

Jun 29 16:26:21 ourserver  dracut[3126800]: lrwxrwxrwx   1 root     root           20 May 29 14:35 usr/share/unimaps -> /usr/lib/kbd/unimaps
Jun 29 16:26:21 ourserver  dracut[3126800]: drwxr-xr-x   3 root     root            0 May 29 14:35 var
Jun 29 16:26:21 ourserver  dracut[3126800]: lrwxrwxrwx   1 root     root           11 May 29 14:35 var/lock -> ../run/lock
Jun 29 16:26:21 ourserver  dracut[3126800]: lrwxrwxrwx   1 root     root            6 May 29 14:35 var/run -> ../run
Jun 29 16:26:21 ourserver  dracut[3126800]: drwxr-xr-x   2 root     root            0 May 29 14:35 var/tmp
Jun 29 16:26:21 ourserver  dracut[3126800]: ========================================================================
Jun 29 16:26:21 ourserver  dracut[3126800]: *** Creating initramfs image file '/boot/initramfs-5.6.19-300.fc32.x86_64.tmp' done ***
Jun 29 16:27:21 ourserver  systemd-tmpfiles[3137785]: /usr/lib/tmpfiles.d/lxdm.conf:1: Line references path below legacy directory /var/run/, updating /var/run/lxdm → /run/lxdm; please update the tmpfiles.d/ drop-in file accordingly.
Jun 29 16:48:05 ourserver  su[2983955]: pam_unix(su:session): session closed for user root
Jun 29 16:48:07 ourserver  sshd[2983731]: pam_unix(sshd:session): session closed for user myuser

我开始认为这可能是这个错误除了 ssh 不慢之外。运行:

systemctl | grep "abandoned" | grep -e "-[[:digit:]]"

返回了几个被放弃的 ssh 会话。正在运行:

systemctl | grep "abandoned" | grep -e "-[[:digit:]]" | sed "s/.scope.*/.scope/" | xargs systemctl stop

删除所有已放弃的会话并且“ df”命令将从“ ”中消失ps

答案1

谜底揭晓,确实是 MobaXterm,左下角有个远程监控选项: 在此处输入图片描述

相关内容