CA 重启期间 FreeIPA 安装在失败

CA 重启期间 FreeIPA 安装在失败

我正在尝试设置一个简单的 vagrant box 来测试 FreeIPA。我使用的是 CentOS 7 镜像,并在 box 上安装了最少的额外东西,并使用非常简单的 FreeIPA 定义开始。我尝试使用简单的 shell 命令来执行此操作,也尝试使用ansible-freeipa。在这两种情况下,我都看到相同的错误,尽管发生的频率似乎不同。在简单的 shell 命令中,它只有大约 50% 的时间会失败,但使用 Ansible 时,它​​似乎是 100%。

失败导致我出现如下错误。

fatal: [ipaserver.test.hadoop.com]: FAILED! => {"changed": false, "module_stderr": "Shared connection to ipaserver.test.hadoop.com closed.\r\n", "module_stdout": "\u001b[?1034hTraceback (most recent call last):\r\n File "/root/.ansible/tmp/ansible-tmp-1583188576.27-186488091977372/AnsiballZ_ipaserver_setup_ca.py", line 102, in \r\n _ansiballz_main()\r\n File "/root/.ansible/tmp/ansible-tmp-1583188576.27-186488091977372/AnsiballZ_ipaserver_setup_ca.py", line 94, in _ansiballz_main\r\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n File "/root/.ansible/tmp/ansible-tmp-1583188576.27-186488091977372/AnsiballZ_ipaserver_setup_ca.py", line 40, in invoke_module\r\n runpy.run_module(mod_name='ansible.modules.ipaserver_setup_ca', init_globals=None, run_name='main', alter_sys=True)\r\n File "/usr/lib64/python2.7/runpy.py", line 176, in run_module\r\n fname, loader, pkg_name)\r\n File "/usr/lib64/python2.7/runpy.py", line 82, in _run_module_code\r\n mod_name, mod_fname, mod_loader, pkg_name)\r\n File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code\r\n exec code in run_globals\r\n File "/tmp/ansible_ipaserver_setup_ca_payload_Pc9wnM/ansible_ipaserver_setup_ca_payload.zip/ansible/modules/ipaserver_setup_ca.py", line 354, in \r\n File "/tmp/ansible_ipaserver_setup_ca_payload_Pc9wnM/ansible_ipaserver_setup_ca_payload.zip/ansible/modules/ipaserver_setup_ca.py", line 345, in main\r\n File "/usr/lib/python2.7/site-packages/ipaserver/install/ca.py", line 391, in install_step_1\r\n ca.start('pki-tomcat')\r\n File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 464, in start\r\n self.service.start(instance_name, capture_output=capture_output, wait=wait)\r\n File "/usr/lib/python2.7/site-packages/ipaplatform/redhat/services.py", line 192, in start\r\n self.wait_until_running()\r\n File "/usr/lib/python2.7/site-packages/ipaplatform/redhat/services.py", line 186, in wait_until_running\r\n raise RuntimeError('CA did not start in %ss' % timeout)\r\nRuntimeError: CA did not start in 300.0s\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

查看 /var/log/messages,我发现错误发生在系统时间 23:25 到 23:27 之间。根据错误信息,这是在 CA 重新启动时发生的。它第一次启动时似乎可以正常启动。

Mar 2 23:25:42 localhost systemd: Stopped PKI Tomcat Server pki-tomcat.
Mar 2 23:25:43 localhost systemd: Starting PKI Tomcat Server pki-tomcat...
Mar 2 23:26:13 localhost pkidaemon: -----------------------
Mar 2 23:26:13 localhost pkidaemon: Banner is not installed
Mar 2 23:26:13 localhost pkidaemon: -----------------------
Mar 2 23:27:07 localhost pkidaemon: ----------------------
Mar 2 23:27:08 localhost pkidaemon: Enabled all subsystems
Mar 2 23:27:08 localhost pkidaemon: ----------------------
Mar 2 23:27:18 localhost systemd: [email protected] start-pre operation timed out. Terminating.
Mar 2 23:27:18 localhost systemd: Failed to start PKI Tomcat Server pki-tomcat.
Mar 2 23:27:18 localhost systemd: Unit [email protected] entered failed state.
Mar 2 23:27:18 localhost systemd: [email protected] failed.

但是,当我查看 PKI 日志时,此时间范围内没有任何内容。以下是 /var/log/pki/pki-ca-spawn.20200302231442.log 中的最后几行:

2020-03-02 23:18:32 pkispawn : INFO END spawning subsystem 'CA' of instance 'pki-tomcat'
2020-03-02 23:18:32 pkispawn : INFO ... archiving configuration into '/var/log/pki/pki-tomcat/ca/archive/spawn_deployment.cfg.20200302231442'
2020-03-02 23:18:32 pkispawn : INFO ....... cp -p /etc/sysconfig/pki/tomcat/pki-tomcat/ca/deployment.cfg /var/log/pki/pki-tomcat/ca/archive/spawn_deployment.cfg.20200302231442
2020-03-02 23:18:32 pkispawn : DEBUG ........... chmod 660 /var/log/pki/pki-tomcat/ca/archive/spawn_deployment.cfg.20200302231442
2020-03-02 23:18:32 pkispawn : DEBUG ........... chown 17:17 /var/log/pki/pki-tomcat/ca/archive/spawn_deployment.cfg.20200302231442
2020-03-02 23:18:32 pkispawn : INFO ... archiving manifest into '/var/log/pki/pki-tomcat/ca/archive/spawn_manifest.20200302231442'
2020-03-02 23:18:32 pkispawn : INFO ....... cp -p /etc/sysconfig/pki/tomcat/pki-tomcat/ca/manifest /var/log/pki/pki-tomcat/ca/archive/spawn_manifest.20200302231442
2020-03-02 23:18:32 pkispawn : DEBUG ........... chmod 660 /var/log/pki/pki-tomcat/ca/archive/spawn_manifest.20200302231442
2020-03-02 23:18:32 pkispawn : DEBUG ........... chown 17:17 /var/log/pki/pki-tomcat/ca/archive/spawn_manifest.20200302231442

/var/log/pki/pki-tomcat/ca/debug 相同:

[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: getConn: mNumConns now 4
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: returnConn: mNumConns now 5
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: In LdapBoundConnFactory::getConn()
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: masterConn is connected: true
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: getConn: conn is connected true
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: getConn: mNumConns now 4
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: returnConn: mNumConns now 5
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: CMSServlet.java: renderTemplate
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: CMSServlet.java: xml parameter detected, returning xml
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: CMSServlet: curDate=Mon Mar 02 23:25:00 UTC 2020 id=caDisplayCertFromRequest time=144

/var/log/pki/pki-tomcat/ca/system 有几个错误,但 23:25 之后没有错误:

0.localhost-startStop-1 - [02/Mar/2020:23:15:08 UTC] [13] [3] authz instance DirAclAuthz initialization failed and skipped, error=Property internaldb.ldapconn.port missing value
0.http-bio-8443-exec-3 - [02/Mar/2020:23:17:53 UTC] [3] [3] CASigningUnit: Object certificate not found. Error Certificate object not found
0.http-bio-8443-exec-3 - [02/Mar/2020:23:17:54 UTC] [11] [3] UGSubsystem: Get User Error netscape.ldap.LDAPException: error result (32); matchedDN = ou=People,o=ipaca
0.Thread-16 - [02/Mar/2020:23:25:00 UTC] [8] [3] Publishing: Could not publish certificate serial number 0x7. Error Failed to publish using rule: No rules enabled

我不知道是什么原因造成的。有什么想法吗?vagrant 文件和 hosts 文件都位于以下 GitHub 存储库中:https://github.com/davidov541/HadoopOnVagrant/tree/AnsibleRetrofit/FreeIPA

答案1

最后的ansible错误:

'CA did not start in %ss' % timeout)\r\nRuntimeError: CA did not start in 300.0s\r\n"

这让我想起了邮件列表中出现的一些问题,它是通过将脚本上的 Starting_timeout 变量设置为更高的值来解决的。

解决方案如下:

https://www.freeipa.org/page/HowTo/FreeIPA_on_banana_pi

也许可以尝试一些东西,我不能保证它能帮你解决问题,但是......

答案2

我最终将 vagrant box 的大小增加到 2 GB 以解决我看到的另一个问题,从昨天开始重新创建 box 10-20 次后,我再也没有看到这个问题。基于此,我认为问题是由于 tomcat 没有足够的内存来按需要启动,导致了我们看到的行为。

相关内容