我们有一个包含 152 台工作机器的 HDP 集群 - worker1.duplex.com
.. worker152.duplex.com
,所有机器都安装在 RHEL 7.9 版本上
我们正在尝试删除最后一个主机 -worker152.duplex.com
从 Ambari 服务器或实际上从 PostgreSQL DB 中删除,如下所示
首先我们需要找到host_id
select host_id from hosts where host_name='worker152.duplex.com';
and host_id is:
host_id
---------
51
(1 row)
现在我们删除这个host_id
- 51
delete from execution_command where task_id in (select task_id from host_role_command where host_id in (51));
delete from host_version where host_id in (51);
delete from host_role_command where host_id in (51);
delete from serviceconfighosts where host_id in (51);
delete from hoststate where host_id in (51);
delete from kerberos_principal_host WHERE host_id='worker152.duplex.com';
delete from hosts where host_name in ('worker152.duplex.com');
delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('worker152.duplex.com'));
现在我们验证host_id
- 代表主机的 51 -worker152.duplex.com
不存在通过以下验证
ambari=> select host_name, public_host_name from hosts;
host_name | public_host_name
--------------------------+--------------------------
worker1.duplex.com
.
.
.
worker151.duplex.com
正如我们上面看到的,主机worker151.duplex.com
不存在,这很好,而且确实似乎该主机worker151.duplex.com
已从 PostgreSQL DB 中删除
现在我们重新启动Ambari-server
以使它生效(它还会重新启动 PostgreSQL 服务)
ambari-server restart
Using python /usr/bin/python
Restarting ambari-server
Waiting for server stop...
Ambari Server stopped
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start.........................
Server started listening on 8080
DB configs consistency check: no errors and warnings were found.
Ambari 服务器启动后,我们很惊讶,因为host_id
-51 或 host -worker152.duplex.com
仍然存在,如下所示
ambari=> select host_name, public_host_name from hosts;
host_name | public_host_name
--------------------------+--------------------------
worker1.duplex.com
.
.
.
worker152.duplex.com
我们不明白为什么尽管我们删除了此记录,但该主机仍再次出现
我们还尝试通过以下方法删除历史数据,但这没有帮助
ambari-server db-purge-history --cluster-name hadoop7 --from-date 2024-01-01
Using python /usr/bin/python
Purge database history...
Ambari Server configured for Embedded Postgres. Confirm you have made a backup of the Ambari Server database [y/n]yes
ERROR: The database purge historical data cannot proceed while Ambari Server is running. Please shut down Ambari first.
Ambari Server 'db-purge-history' completed successfully.
为什么主机
Ambari-server
重启后返回?我们的删除过程出了什么问题?
PostgreSQL 版本:
postgres=# SHOW server_version;
server_version
----------------
9.2.24
(1 row)
链接:
https://www.andruffsolutions.com/removing-old-host-data-from-ambari-server-and-tuning-the-database/