更新

更新

我目前正在构建一个基础设施管理工具,用于配置裸机和虚拟机等。我们有一个工作虚拟机,它通过 SSH 在远程节点上运行命令(通过 ansible)。

其中一个步骤需要重启节点以应用一些配置。重启完成后,工作进程必须在节点上运行更多命令(必须同步完成)。

我的问题是,我如何检查重启是否已完成?

我可以添加一个睡眠定时器(等待重启完成),但我觉得由于多种原因,这是一个糟糕的解决方案。

另一个选择是每 5 秒左右尝试从我的工作进程通过 SSH 连接到远程节点,如果失败,则继续重试,直到成功连接。

还有其他方法可以做到这一点吗?

答案1

正如您提到的,您正在通过 ansible 运行命令,以下是我在剧本中用于重新启动的内容(我正在管理 Ubuntu 14/16.04 机器):

---
# execute like:
# ansible-playbook reboot.yaml --inventory hosts --extra-vars "hosts=all user=admin"
# or
# ansible-playbook reboot.yaml -i hosts -e "hosts=all user=admin"
- hosts: "{{ hosts }}"
  remote_user: "{{ user }}"
  become: yes
  tasks:
    # add this to to guard you from yourself ;)
    #- name: "ask for verification"
    #  pause:
    #    prompt: "Are you sure you want to restart all specified hosts?"

    # here comes the juicy part
    - name: "reboot hosts"
      shell: "sleep 2 && shutdown -r now 'Reboot triggered by Ansible'" # sleep 2 is needed, else this task might fail
      async: "1" # run asynchronously
      poll: "0" # don't ask for the status of the command, just fire and forget
      ignore_errors: yes # this command will get cut off by the reboot, so ignore errors
    - name: "wait for hosts to come up again"
      wait_for:
        host: "{{ inventory_hostname }}"
        port: "22" # wait for ssh as this is what is needed for ansible
        state: "started"
        delay: "120" # start checking after this amount of time
        timeout: "360" # give up after this amount of time
      delegate_to: "localhost" # check from the machine executing the playbook
...

更新

Ansible 2.7 现在有一个重启模块,因此您无需自行创建命令。上面的剧本将转换为以下内容:

---
# execute like:
# ansible-playbook reboot.yaml --inventory hosts --extra-vars "hosts=all user=admin"
# or
# ansible-playbook reboot.yaml -i hosts -e "hosts=all user=admin"
- hosts: "{{ hosts }}"
  remote_user: "{{ user }}"
  become: yes
  tasks:
    # add this to to guard you from yourself ;)
    #- name: "ask for verification"
    #  pause:
    #    prompt: "Are you sure you want to restart all specified hosts?"

    - name: "reboot hosts"
      reboot:
        msg: "Reboot triggered by Ansible"
        reboot_timeout: 360
...

答案2

如果你想检查主机的状态、主机重启的时间以及许多其他参数,那么你应该使用监控软件,例如扎比克斯纳吉奥斯等等。

重启时间可以通过uptime系统参数检查。它显示自上次启动以来的时间。您可以通过uptimeLinux/UNIX 主机上的命令获取它,或者当 snmpd 服务在主机上运行时通过 SNMP 协议远程获取它:

snmpget -v2c -c public host_name_or_ip_address sysUpTime.0

相关内容