为什么当我通过 Ansible 将我的实例添加到负载均衡器时，它会未能通过 ELB 运行状况检查？

2024-5-29 • tag-icon

为什么当我通过 Ansible 将我的实例添加到负载均衡器时，它会未能通过 ELB 运行状况检查？

我正在尝试使用带有模块的 Ansible 剧本将 EC2 实例添加到弹性负载均衡器ec2_elb。这是应该执行的任务：

- name: "Add host to load balancer {{ load_balancer_name }}"
  sudo: false
  local_action:
    module: ec2_elb
    state: present
    wait: true
    region: "{{ region }}"
    ec2_elbs: ['{{ load_balancer_name }}']
    instance_id: "{{ ec2_id }}"

但是，它通常会失败，并出现以下输出（详细程度增加）：

TASK: [Add host to load balancer ApiELB-staging] ****************************** 
<127.0.0.1> REMOTE_MODULE ec2_elb region=us-east-1 state=present instance_id=i-eb7e0cc7
<127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868 && echo $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868']
<127.0.0.1> PUT /var/folders/d4/17fw96k107d5kbck6fb2__vc0000gn/T/tmpki4HPF TO /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ec2_elb
<127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ec2_elb; rm -rf /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ >/dev/null 2>&1']
failed: [10.0.115.149 -> 127.0.0.1] => {"failed": true}
msg: The instance i-eb7e0cc7 could not be put in service on LoadBalancer:ApiELB-staging. Reason: Instance has not passed the configured HealthyThreshold number of health checks consecutively.

FATAL: all hosts have already failed -- aborting

我的 ELB 配置定义如下（也是通过 Ansible）：

- name: "Ensure load balancer exists: {{ load_balancer_name }}"
  sudo: false
  local_action:
    module: ec2_elb_lb
    name: "{{ load_balancer_name }}"
    state: present
    region: "{{ region }}"
    subnets: "{{ vpc_public_subnet_ids }}"
    listeners:
      - protocol: https
        load_balancer_port: 443
        instance_protocol: http
        instance_port: 8888
        ssl_certificate_id: "{{ ssl_cert }}"
    health_check:
        ping_protocol: http # options are http, https, ssl, tcp
        ping_port: 8888
        ping_path: "/internal/v1/status"
        response_timeout: 5 # seconds
        interval: 30 # seconds
        unhealthy_threshold: 10
        healthy_threshold: 10
  register: apilb

当我从笔记本电脑或服务器本身（作为本地主机）访问状态资源时，我得到了200预期的响应。command在将实例添加到 ELB 之前，我还向 Ansible 剧本添加了一项任务，以确认应用程序已启动并正确处理请求（事实确实如此）：

- command: /usr/bin/curl -v --fail http://localhost:8888/internal/v1/status

我在我的应用程序日志中没有看到状态检查资源的任何非 200 响应（但当然，如果请求从未到达我的应用程序，它们就不会被记录）。

另一件奇怪的事情是，该实例确实被添加到了 ELB，而且似乎工作正常。所以我知道，至少在某个时候，负载均衡器可以正确访问应用程序（对于状态检查资源和其他资源）。AWS 控制台显示实例运行正常，Cloudwatch 图表未显示任何失败的运行状况检查。

有任何想法吗？

答案1

改编自我之前的评论：

从 Ansible 文档来看，有一个wait_timeout您必须将该参数设置为高于 300 的值才能使其工作。（330 是安全的）。

或者您可以降低您的interval或healthy_threshold或两者，以便您等待的时间少于 300 秒。

您的unhealthy_threshold与相同healthy_threshold，因此一旦 Web 服务器开始抛出 500 个响应，它将在池中停留 5 分钟，然后 ELB 会将其丢弃。

答案2

您可以使用 ec2_elb 选项wait: no。

答案1

答案2

相关内容