我目前正在构建一个基础设施管理工具,用于配置裸机和虚拟机等。我们有一个工作虚拟机,它通过 SSH 在远程节点上运行命令(通过 ansible)。
其中一个步骤需要重启节点以应用一些配置。重启完成后,工作进程必须在节点上运行更多命令(必须同步完成)。
我的问题是,我如何检查重启是否已完成?
我可以添加一个睡眠定时器(等待重启完成),但我觉得由于多种原因,这是一个糟糕的解决方案。
另一个选择是每 5 秒左右尝试从我的工作进程通过 SSH 连接到远程节点,如果失败,则继续重试,直到成功连接。
还有其他方法可以做到这一点吗?
答案1
正如您提到的,您正在通过 ansible 运行命令,以下是我在剧本中用于重新启动的内容(我正在管理 Ubuntu 14/16.04 机器):
---
# execute like:
# ansible-playbook reboot.yaml --inventory hosts --extra-vars "hosts=all user=admin"
# or
# ansible-playbook reboot.yaml -i hosts -e "hosts=all user=admin"
- hosts: "{{ hosts }}"
remote_user: "{{ user }}"
become: yes
tasks:
# add this to to guard you from yourself ;)
#- name: "ask for verification"
# pause:
# prompt: "Are you sure you want to restart all specified hosts?"
# here comes the juicy part
- name: "reboot hosts"
shell: "sleep 2 && shutdown -r now 'Reboot triggered by Ansible'" # sleep 2 is needed, else this task might fail
async: "1" # run asynchronously
poll: "0" # don't ask for the status of the command, just fire and forget
ignore_errors: yes # this command will get cut off by the reboot, so ignore errors
- name: "wait for hosts to come up again"
wait_for:
host: "{{ inventory_hostname }}"
port: "22" # wait for ssh as this is what is needed for ansible
state: "started"
delay: "120" # start checking after this amount of time
timeout: "360" # give up after this amount of time
delegate_to: "localhost" # check from the machine executing the playbook
...
更新
Ansible 2.7 现在有一个重启模块,因此您无需自行创建命令。上面的剧本将转换为以下内容:
---
# execute like:
# ansible-playbook reboot.yaml --inventory hosts --extra-vars "hosts=all user=admin"
# or
# ansible-playbook reboot.yaml -i hosts -e "hosts=all user=admin"
- hosts: "{{ hosts }}"
remote_user: "{{ user }}"
become: yes
tasks:
# add this to to guard you from yourself ;)
#- name: "ask for verification"
# pause:
# prompt: "Are you sure you want to restart all specified hosts?"
- name: "reboot hosts"
reboot:
msg: "Reboot triggered by Ansible"
reboot_timeout: 360
...