我们使用类似 的命令远程运行一个简单的部署脚本。它今天开始挂起,因为在 上等待了很长时间,即使已经完成。我们以调试模式启动并获得了两种不同类型的日志。以下是会话未挂起时的正常日志:ssh [email protected] sudo /root/run-chef-client.sh
sshd
10.170.4.11
sudo
sshd
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 23187
debug1: session_exit_message: session 0 channel 0 pid 23187
debug1: session_exit_message: release channel 0
Received disconnect from 10.170.4.6: 11: disconnected by user
当它挂起时,我们会得到以下信息:
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 24209
debug1: session_exit_message: session 0 channel 0 pid 24209
debug1: session_exit_message: release channel 0
我们的理解是,服务器进程等待来自客户端的某些通信,但始终没有收到。很难判断这是客户端问题还是服务器问题。我们尝试sshd
在以下情况下运行strace
,但没有成功,因为在这种情况下忽略了 SUID 位sudo
。那么,我们还应该尝试什么来调试/防止这种情况?
答案1
ssh -t
在客户端使用(强制 PTY 分配)解决了该问题:
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 31701
debug1: session_exit_message: session 0 channel 0 pid 31701
debug1: session_exit_message: release channel 0
debug1: session_pty_cleanup: session 0 release /dev/pts/1
Received disconnect from 127.0.0.1: 11: disconnected by user
debug1: do_cleanup
debug1: PAM: cleanup
debug1: PAM: closing session
debug1: PAM: deleting credentials
sshd
由伪 TTY 控制,不再由客户端控制。