无法将应用程序放回 ELB:实例未连续通过配置的 HealthyThreshold 数量的健康检查

无法将应用程序放回 ELB:实例未连续通过配置的 HealthyThreshold 数量的健康检查

将 EC2 实例放回 ELB 过去 90% 的时间都有效。不幸的是,最近部署经常失败,并出现以下错误:

15:51:59 TASK: [Start the app] ********************************************************* 
15:51:59 changed: [app-01a] => {"changed": true, "enabled": true, "name": "app", "state": "started"}
15:51:59 
15:51:59 TASK: [Wait for the app to be ready] ****************************************** 
15:52:17 ok: [app-01a] => {"changed": false, "elapsed": 17, "path": null, "port": 8080, "search_regex": null, "state": "started"}
15:52:17 
15:52:17 TASK: [Check health check on localhost] *************************************** 
15:52:22 ok: [app-01a] => {"cache_control": "must-revalidate,no-cache,no-..."status": 200,...
15:52:22 
15:52:22 TASK: [Exit if health check fails] ******************************************** 
15:52:22 skipping: [app-01a]
15:52:22 
15:52:22 TASK: [Register restapp instance back into load balancer] ********************* 
15:52:39 failed: [app-01a -> 127.0.0.1] => (item=app-ELB) => {"failed": true, "item": "app-ELB"}
15:52:39 msg: The instance i-b1234567 could not be put in service on LoadBalancer:app-ELB. Reason: Instance has not passed the configured HealthyThreshold number of health checks consecutively.
15:52:39 
15:52:39 FATAL: all hosts have already failed -- aborting

这是 Ansible 代码:

- name: Start the app
  service: name={{ app_name }} state=started enabled=yes

- name: Wait for the app to be ready
  wait_for: port={{ app_port }} state=started timeout=120

- name: Check health check on localhost
  action: uri url=http://localhost:8081/healthcheck
  register: webpage

- name: Exit if health check fails
  command: /bin/false
  when: webpage.status != 200

- name: Register restapp instance back into load balancer
  sudo: false
  local_action:
    module: ec2_elb
    instance_id: "{{ appInstanceId }}"
    ec2_elbs: "{{ item }}"
    state: 'present'
    region: "eu-west-1"
  with_items: appLoadBalancer

ELB 设置:

http://i.imgur.com/RIM8L5q.png

答案1

您可能需要将 ping 目标从 HTTP:8081/pin(clipped) 更改为 HTTP:8081/healthcheck,就像在 Ansible play 中一样。

答案2

这是预期的 AWS 行为。如果您修复了应用程序,ELB 将重新启用到您端点的流量。尝试强制离线主机投入服务违背了健康检查试图实现的目标。

您无法在 ELB 控制台中执行此操作,甚至删除并重新添加主机也需要您等待健康检查通过后才能传递流量。

如果您希望主机更快地恢复服务,请更改您的健康检查容差。

相关内容