Ansible 收集某些主机的 findmnt 命令失败的事实

Ansible 收集某些主机的 findmnt 命令失败的事实

ANSIBLE版本

ansible 2.4.6.0
config file = /home/xxxxxx/ansible.cfg
configured module search path = [u'/home/xxxxxx/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version =

2.7.5(默认,2019年8月7日,00:51:29)[GCC 4.8.5 20150623(Red Hat 4.8.5-39)]

配置

猫~/.ansible.cfg

[defaults]
host_key_checking = False
forks = 5
log_path = /home/userid/ansible.log

[ssh_connection]
pipelining = true

grep ^[^#] /etc/ansible/ansible.cfg
[defaults]
roles_path = /etc/ansible/roles:/usr/share/ansible/roles
host_key_checking = False

操作系统/环境 客户端:CentOS Linux 版本 7.5.1804(核心)

重现步骤 Ansible 所有 Playbook 都可以正常工作,除了对收集事实的任何引用之外。收集事实模块和任何对收集事实的引用都会挂起。

示例 - 命令 ansible all -i ansible/inventory/inventory -m setup -u userid -k -K -vvv

实际结果

ansible all -i ansible/inventory/inventory-file -m setup -u userid -k -K --limit="130.100.136.118,130.100.136.114" -vvv
ansible 2.4.6.0
config file = /home/userid/ansible.cfg
configured module search path = [u'/home/userid/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Aug 7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
Using /home/userid/ansible.cfg as config file
SSH password:
SUDO password[defaults to SSH password]:
Parsed /home/userid/ansible/inventory/dop-poc-ibm inventory source with ini plugin
META: ran handlers
Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py
<130.100.136.114> ESTABLISH SSH CONNECTION FOR USER: userid
Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py
<130.100.136.118> ESTABLISH SSH CONNECTION FOR USER: userid
<130.100.136.114> SSH: EXEC sshpass -d14 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o User=userid -o ConnectTimeout=10 -o ControlPath=/home/userid/.ansible/cp/1f9f8629ab 130.100.136.114 '/bin/sh -c '"'"'/usr/bin/python && sleep 0'"'"''
<130.100.136.118> SSH: EXEC sshpass -d15 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o User=userid -o ConnectTimeout=10 -o ControlPath=/home/userid/.ansible/cp/e3a887b653 130.100.136.118 '/bin/sh -c '"'"'/usr/bin/python && sleep 0'"'"''
<130.100.136.114> (1, '\n{"exception": "Traceback (most recent call last):\n File "/tmp/ansible_5w_PfH/ansible_modlib.zip/ansible/module_utils/basic.py", line 2786, in run_command\n cmd = subprocess.Popen(args, **kwargs)\n File "/usr/lib64/python2.7/subprocess.py", line 711, in init\n errread, errwrite)\n File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child\n data = _eintr_retry_call(os.read, errpipe_read, 1048576)\n File "/usr/lib64/python2.7/subprocess.py", line 478, in _eintr_retry_call\n return func(args)\n File "/tmp/ansible_5w_PfH/ansible_modlib.zip/ansible/module_utils/facts/timeout.py", line 37, in _handle_timeout\n raise TimeoutError(msg)\nTimeoutError: Timer expired after 10 seconds\n", "cmd": "/usr/bin/findmnt --list --noheadings --notruncate", "failed": true, "rc": 257, "invocation": {"module_args": {"filter": "", "gather_subset": ["all"], "fact_path": "/etc/ansible/facts.d", "gather_timeout": 10}}, "msg": "Timer expired after 10 seconds"}\n', '')
The full traceback is:
Traceback (most recent call last):
File "/tmp/ansible_5w_PfH/ansible_modlib.zip/ansible/module_utils/basic.py", line 2786, in run_command
cmd = subprocess.Popen(args, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child
data = _eintr_retry_call(os.read, errpipe_read, 1048576)
File "/usr/lib64/python2.7/subprocess.py", line 478, in _eintr_retry_call
return func(*args)
File "/tmp/ansible_5w_PfH/ansible_modlib.zip/ansible/module_utils/facts/timeout.py", line 37, in _handle_timeout
raise TimeoutError(msg)
TimeoutError: Timer expired after 10 seconds

130.100.136.114 | FAILED! => {
"changed": false,
"cmd": "/usr/bin/findmnt --list --noheadings --notruncate",
"failed": true,
"invocation": {
"module_args": {
"fact_path": "/etc/ansible/facts.d",
"filter": "*",
"gather_subset": [
"all"
],
"gather_timeout": 10
}
},
"msg": "Timer expired after 10 seconds",
"rc": 257
}
<130.100.136.118> (1, '\n{"exception": "Traceback (most recent call last):\n File "/tmp/ansible_Alx9Sv/ansible_modlib.zip/ansible/module_utils/basic.py", line 2786, in run_command\n cmd = subprocess.Popen(args, **kwargs)\n File "/usr/lib64/python2.7/subprocess.py", line 711, in init\n errread, errwrite)\n File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child\n data = _eintr_retry_call(os.read, errpipe_read, 1048576)\n File "/usr/lib64/python2.7/subprocess.py", line 478, in _eintr_retry_call\n return func(args)\n File "/tmp/ansible_Alx9Sv/ansible_modlib.zip/ansible/module_utils/facts/timeout.py", line 37, in _handle_timeout\n raise TimeoutError(msg)\nTimeoutError: Timer expired after 10 seconds\n", "cmd": "/usr/bin/findmnt --list --noheadings --notruncate", "failed": true, "rc": 257, "invocation": {"module_args": {"filter": "", "gather_subset": ["all"], "fact_path": "/etc/ansible/facts.d", "gather_timeout": 10}}, "msg": "Timer expired after 10 seconds"}\n', '')
The full traceback is:
Traceback (most recent call last):
File "/tmp/ansible_Alx9Sv/ansible_modlib.zip/ansible/module_utils/basic.py", line 2786, in run_command
cmd = subprocess.Popen(args, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child
data = _eintr_retry_call(os.read, errpipe_read, 1048576)
File "/usr/lib64/python2.7/subprocess.py", line 478, in _eintr_retry_call
return func(*args)
File "/tmp/ansible_Alx9Sv/ansible_modlib.zip/ansible/module_utils/facts/timeout.py", line 37, in _handle_timeout
raise TimeoutError(msg)
TimeoutError: Timer expired after 10 seconds

130.100.136.118 | FAILED! => {
"changed": false,
"cmd": "/usr/bin/findmnt --list --noheadings --notruncate",
"failed": true,
"invocation": {
"module_args": {
"fact_path": "/etc/ansible/facts.d",
"filter": "*",
"gather_subset": [
"all"
],
"gather_timeout": 10
}
},
"msg": "Timer expired after 10 seconds",
"rc": 257
}

尝试的步骤

Increased gather_timeout = 20 or 30 in home folder  ansible.cfg, Didnt helped.

Tried gather_subset = !all, Didnt helped.

Manual execution of 
ansible -i ansible/inventory/inventory -u userid@domain --become -m shell -a '/usr/bin/findmnt --list --noheadings --notruncate' linux -k -K Worked. Noticed, it takes a few seconds to publish results.

到目前为止的解决方法

Commented section in "/usr/lib/python2.7/site-packages/ansible/module_utils/facts/hardware/linux.py"
#def _run_findmnt(self, findmnt_path):
   #     args = ['--list', '--noheadings', '--notruncate']
   #     cmd = [findmnt_path] + args
   #     rc, out, err = self.module.run_command(cmd, errors='surrogate_then_replace')
   #     return rc, out, err

   #def _find_bind_mounts(self):
   #     bind_mounts = set()
   #     findmnt_path = self.module.get_bin_path("findmnt")
   #     if not findmnt_path:
   #         return bind_mounts

   #     rc, out, err = self._run_findmnt(findmnt_path)
   #     if rc != 0:
   #         return bind_mounts

        # find bind mounts, in case /etc/mtab is a symlink to /proc/mounts
   #     for line in out.splitlines():
   #         fields = line.split()
            # fields[0] is the TARGET, fields[1] is the SOURCE
   #         if len(fields) < 2:
   #             continue

            # bind mounts will have a [/directory_name] in the SOURCE column
    #        if self.BIND_MOUNT_RE.match(fields[1]):
    #            bind_mounts.add(fields[0])

     #   return bind_mounts

答案1

我不确定这就是问题所在,但由于过时的 NFS 安装,我遇到了问题。如果您可以 ssh 到其中一台发生故障的服务器并查看 df 命令是否可以正常工作而不会挂起,以排除这种情况。

答案2

更新 ansible 到 2.8 之后就没再出现这个问题了。

相关内容