将 EC2 实例放回 ELB 过去 90% 的时间都有效。不幸的是,最近部署经常失败,并出现以下错误:
15:51:59 TASK: [Start the app] *********************************************************
15:51:59 changed: [app-01a] => {"changed": true, "enabled": true, "name": "app", "state": "started"}
15:51:59
15:51:59 TASK: [Wait for the app to be ready] ******************************************
15:52:17 ok: [app-01a] => {"changed": false, "elapsed": 17, "path": null, "port": 8080, "search_regex": null, "state": "started"}
15:52:17
15:52:17 TASK: [Check health check on localhost] ***************************************
15:52:22 ok: [app-01a] => {"cache_control": "must-revalidate,no-cache,no-..."status": 200,...
15:52:22
15:52:22 TASK: [Exit if health check fails] ********************************************
15:52:22 skipping: [app-01a]
15:52:22
15:52:22 TASK: [Register restapp instance back into load balancer] *********************
15:52:39 failed: [app-01a -> 127.0.0.1] => (item=app-ELB) => {"failed": true, "item": "app-ELB"}
15:52:39 msg: The instance i-b1234567 could not be put in service on LoadBalancer:app-ELB. Reason: Instance has not passed the configured HealthyThreshold number of health checks consecutively.
15:52:39
15:52:39 FATAL: all hosts have already failed -- aborting
这是 Ansible 代码:
- name: Start the app
service: name={{ app_name }} state=started enabled=yes
- name: Wait for the app to be ready
wait_for: port={{ app_port }} state=started timeout=120
- name: Check health check on localhost
action: uri url=http://localhost:8081/healthcheck
register: webpage
- name: Exit if health check fails
command: /bin/false
when: webpage.status != 200
- name: Register restapp instance back into load balancer
sudo: false
local_action:
module: ec2_elb
instance_id: "{{ appInstanceId }}"
ec2_elbs: "{{ item }}"
state: 'present'
region: "eu-west-1"
with_items: appLoadBalancer
ELB 设置:
答案1
您可能需要将 ping 目标从 HTTP:8081/pin(clipped) 更改为 HTTP:8081/healthcheck,就像在 Ansible play 中一样。
答案2
这是预期的 AWS 行为。如果您修复了应用程序,ELB 将重新启用到您端点的流量。尝试强制离线主机投入服务违背了健康检查试图实现的目标。
您无法在 ELB 控制台中执行此操作,甚至删除并重新添加主机也需要您等待健康检查通过后才能传递流量。
如果您希望主机更快地恢复服务,请更改您的健康检查容差。