“service mysqld stop”超时（然后发现“mysqld dead but subsys fixed”）

2024-5-28 • tag-icon

“service mysqld stop”超时（然后发现“mysqld dead but subsys fixed”）

我在我的 64 位 CentOS 5 服务器上通过 yum 安装了 mysql 和服务器。它启动正常，但当我尝试停止它时，它停滞了，然后我必须“Ctrl-C”它。然后我运行“service mysqld status”，它显示：

mysqld dead but subsys locked

我运行了 ps aux，但找不到 mysql。通过“service mysqld start”再次启动 mysqld 没问题。尝试停止它会产生同样的问题。

然后我意识到它/var/lock/subsys/mysqld仍然存在。运行 mysqld 时我检查/var/run/mysqld/mysqld.pid它是否与正在运行的服务的 pid 匹配。

我尝试重新安装 mysql 并删除所有文件和配置但无济于事。

该怎么办？

编辑：

我在 /etc/init.d/mysqld 文件中添加了一些 echo 语句，特别是在 stop 函数中：

stop(){
        if [ ! -f "$mypidfile" ]; then
            # not running; per LSB standards this is "ok"
            action $"Stopping $prog: " /bin/true
            return 0
        fi  
        echo "beginning stop sequence"
        MYSQLPID=`cat "$mypidfile"`
        if [ -n "$MYSQLPID" ]; then
            /bin/kill "$MYSQLPID" >/dev/null 2>&1
            echo "killing pid $MYSQLPID"
            ret=$?
            if [ $ret -eq 0 ]; then
                echo "return code $ret after kill attempt"
                TIMEOUT="$STOPTIMEOUT"
                echo "timeout is set to $STOPTIMEOUT"
                while [ $TIMEOUT -gt 0 ]; do
                    /bin/kill -0 "$MYSQLPID" >/dev/null 2>&1 || break
                    sleep 1
                    let TIMEOUT=${TIMEOUT}-1
                    echo "timeout is now $TIMEOUT"
                done
                if [ $TIMEOUT -eq 0 ]; then
                    echo "Timeout error occurred trying to stop MySQL Daemon."
                    ret=1
                    action $"Stopping $prog: " /bin/false
                else
                    echo "attempting to del lockfile: $lockfile"
                    rm -f $lockfile
                    rm -f "$socketfile"
                    action $"Stopping $prog: " /bin/true
                fi
            else
                action $"Stopping $prog: " /bin/false
            fi
        else
            # failed to read pidfile, probably insufficient permissions
            action $"Stopping $prog: " /bin/false
            ret=4
        fi
        return $ret
}

这是我尝试停止服务时得到的结果：

[root@server]# service mysqld stop
beginning stop sequence
killing pid 9145
return code 0 after kill attempt
timeout is set to 60
timeout is now 59
timeout is now 58
timeout is now 57
timeout is now 56
timeout is now 55
timeout is now 54
timeout is now 53
timeout is now 52
timeout is now 51
timeout is now 50
timeout is now 49

从代码来看，我觉得它永远不会跳出那个 while 循环，也无法删除锁定文件。我理解错了吗？我检查了另一台服务器上的同一个文件，它使用相同的代码。我惊呆了。

编辑：在 while 循环部分

 /bin/kill -0 "$MYSQLPID" >/dev/null 2>&1 || break

由于某种原因，它无法识别返回代码。当调用 service mysqld stop 时，进程已被终止，但不确定为什么它不允许循环中断。

编辑：进一步的测试表明，调用/bin/kill和仅调用之间存在一些奇怪的行为kill，它们显然返回不同的代码，为什么？？？？：

[root@server]# /bin/kill 25200
kill 25200: No such process
[user@server]# echo ${?}
0
[root@server]# kill 25200
-bash: kill: (25200) - No such process
[root@server]# echo ${?}
1

编辑：我以非root用户身份登录并尝试执行“kill”和“/bin/kill”，结果令人惊讶：

[notroot@server ~]$ kill -0 23232
-bash: kill: (23232) - No such process
[notroot@server ~]$ echo $?
1
[notroot@server ~]$ /bin/kill -0 23232
kill 23232: No such process
(No info could be read for "-p": geteuid()=501 but you should be root.)
[notroot@server ~]$ echo $?
0

当以非 root 用户身份执行 kill 和 bin/kill 时，“无法读取信息”错误不会出现在我的其他服务器中。

编辑：添加了 quanta 描述的日志，并检查了 mysql 日志：

启动和停止后，mysql 日志显示以下内容：

110918 00:11:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
110918  0:11:28 [Note] Plugin 'FEDERATED' is disabled.
110918  0:11:28  InnoDB: Initializing buffer pool, size = 16.0M
110918  0:11:28  InnoDB: Completed initialization of buffer pool
110918  0:11:29  InnoDB: Started; log sequence number 0 44233
110918  0:11:29 [Warning] 'user' entry 'root@server' ignored in --skip-name-resolve mode.
110918  0:11:29 [Warning] 'user' entry '@server' ignored in --skip-name-resolve mode.
110918  0:11:29 [Note] Event Scheduler: Loaded 0 events
110918  0:11:29 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.58-ius'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Distributed by The IUS Community Project
110918  0:11:34 [Note] /usr/libexec/mysqld: Normal shutdown

110918  0:11:34 [Note] Event Scheduler: Purging the queue. 0 events
110918  0:11:34  InnoDB: Starting shutdown...
110918  0:11:39  InnoDB: Shutdown completed; log sequence number 0 44233
110918  0:11:39 [Note] /usr/libexec/mysqld: Shutdown complete

110918 00:11:39 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

然后在 tmp/mysql.log 中：

kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process

我中途停止了停止进程，这样就不必等待超时。看起来进程被终止了。我认为问题仍然在于"kill"和的返回代码不同"/bin/kill"

答案1

首先要说的是：调试做得非常出色、系统且彻底，干得好。

在我的 RHEL 5.6 机器上，如果我尝试终止不存在的 pid，我总是会得到返回代码 1。我尝试以 root 和非特权用户的身份执行此操作，都使用完整路径，并且只使用命令名称。我也只得到了简短的kill XXX: No such process，没有详细的错误消息。

运行rpm -Vv util-linux一下看看是否有人没有/bin/kill用新的改进版本进行替换，这可能是一个好主意。即使 rpm 验证表明文件是原始的，我也会尝试重命名/bin/kill并从正常工作的机器上复制二进制文件。如果文件替换有帮助，而您没有发现更改的合法来源，那么无论 rpm 验证的输出如何，我都会假设机器已被入侵。

答案1

相关内容