准备

Question 1

您将在机器（或任何机器）上运行一段单独的代码来监控健康状况并运行将实例设置为不健康的 API 调用

我还有另一个半成型的想法，但我不太确定如何实现它。我是一个内部部署系统的架构师，我们将负载平衡器调用到实例上的一个单独的 Web 服务器中，在我们的例子中，它是一个小型的自定义 Java Web 服务器，大约有 50 行代码。它返回 HTTP 状态代码，如果运行正常，则返回 200（OK），如果需要终止，则返回 500（ERROR）。我怀疑类似的东西可以与自动缩放集成，但我有一段时间没有这样做了，我不确定你如何将它与自动缩放集成。

这是上面第一个想法中的命令

aws autoscaling set-instance-health --instance-id i-123abc45d --health-status Unhealthy

Answer

自定义实例健康检查（页面底部）是一个选项。

您将在机器（或任何机器）上运行一段单独的代码来监控健康状况并运行将实例设置为不健康的 API 调用

我还有另一个半成型的想法，但我不太确定如何实现它。我是一个内部部署系统的架构师，我们将负载平衡器调用到实例上的一个单独的 Web 服务器中，在我们的例子中，它是一个小型的自定义 Java Web 服务器，大约有 50 行代码。它返回 HTTP 状态代码，如果运行正常，则返回 200（OK），如果需要终止，则返回 500（ERROR）。我怀疑类似的东西可以与自动缩放集成，但我有一段时间没有这样做了，我不确定你如何将它与自动缩放集成。

这是上面第一个想法中的命令

aws autoscaling set-instance-health --instance-id i-123abc45d --health-status Unhealthy

Question 2

对于遇到这个问题的任何人：

虽然我相信 AWS 应该在 CloudWatch 中包含这样的功能，但遗憾的是我找不到任何表明此功能可用的信息。因此，我创建了一个 bash 脚本，该脚本查询 CloudWatch API 以确定资源消耗指标，然后相应地设置实例运行状况，如建议的那样蒂姆：

准备

如果你还没有这样做，安装 AWS 命令行界面. 也可通过yum或获得apt。
配置 AWS CLI通过运行aws configure，填写您的 API 密钥和其他设置。重要的：如果您打算像我一样以 root 身份运行下面的脚本，则必须以 root 身份运行此配置命令。否则，脚本将失败。

/root/我的健康检查.sh

#!/bin/bash
# retrieve metrics starting from 20 minutes ago (3 data points)
# Note: Sometimes CloudWatch failed to gather data for a specific period,
# then the number of data points returned could be less than what we expect.
# Also, when the instance just started, there will be no data point.
start_time=$(date -d "-20 minutes" -u +"%Y-%m-%dT%H:%M:%SZ")
# retrieve metrics up to now
end_time=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
# get current instance ID [1]
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
# get current region [2]
# This is only needed if you have multiple regions to manage, otherwise just
# specify a region via `aws configure`.
region=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/\(.*\)[a-z]/\1/')
# save data retrieved for processing [3]
# Here I used an example of retrieving "NetworkIn" of "AWS/EC2" namespace,
# with metric resolution set to 300 (5 minutes).
# For a list of available metrics, run `aws cloudwatch list-metrics`
datapoints=$(aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name NetworkIn --dimensions Name=InstanceId,Value=$instance_id --statistics Average --start-time $start_time --end-time $end_time --period 300 --region $region --output text | awk '{ print $2 }')
# custom handler
# In this example, the health check will fail if all data points fall below
# my threshold. The health check will not fail if there is no data.
healthy=0
hasdata=0
THRESHOLD=300000
for i in $datapoints; do
    # In this case, the metric(NetworkIn) is not integer.
    if (( $(echo "$i $THRESHOLD" | awk '{print ($1 > $2)}') )); then
        healthy=1
    fi
    hasdata=1
done
if [ $hasdata -eq 1 ]; then
    if [ $healthy -eq 0 ]; then
        aws autoscaling set-instance-health --instance-id $instance_id --health-status Unhealthy --region $region
    fi
fi

其余的部分

使脚本定期运行

$ chmod +x /root/my-health-check.sh
# run the script at 0, 5, 10, 15 ... 55 of every hour
$ echo "*/5 * * * * root /root/my-health-check.sh 2>&1 | /usr/bin/logger -t ec2_health_check" >> /etc/crontab

关闭实例并创建 AMI。完成后，使用该 AMI 创建一个新自动扩展组。现在，如果指标不满足健康条件，它应该自行终止并启动一个新组。瞧！

参考：

[1]：EC2 实例元数据

[2]：获取 AWS 中的当前区域 - StackOverflow

[3]：CloudWatch - 获取指标统计信息

Answer

对于遇到这个问题的任何人：

虽然我相信 AWS 应该在 CloudWatch 中包含这样的功能，但遗憾的是我找不到任何表明此功能可用的信息。因此，我创建了一个 bash 脚本，该脚本查询 CloudWatch API 以确定资源消耗指标，然后相应地设置实例运行状况，如建议的那样蒂姆：

准备

如果你还没有这样做，安装 AWS 命令行界面. 也可通过yum或获得apt。
配置 AWS CLI通过运行aws configure，填写您的 API 密钥和其他设置。重要的：如果您打算像我一样以 root 身份运行下面的脚本，则必须以 root 身份运行此配置命令。否则，脚本将失败。

/root/我的健康检查.sh

#!/bin/bash
# retrieve metrics starting from 20 minutes ago (3 data points)
# Note: Sometimes CloudWatch failed to gather data for a specific period,
# then the number of data points returned could be less than what we expect.
# Also, when the instance just started, there will be no data point.
start_time=$(date -d "-20 minutes" -u +"%Y-%m-%dT%H:%M:%SZ")
# retrieve metrics up to now
end_time=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
# get current instance ID [1]
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
# get current region [2]
# This is only needed if you have multiple regions to manage, otherwise just
# specify a region via `aws configure`.
region=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/\(.*\)[a-z]/\1/')
# save data retrieved for processing [3]
# Here I used an example of retrieving "NetworkIn" of "AWS/EC2" namespace,
# with metric resolution set to 300 (5 minutes).
# For a list of available metrics, run `aws cloudwatch list-metrics`
datapoints=$(aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name NetworkIn --dimensions Name=InstanceId,Value=$instance_id --statistics Average --start-time $start_time --end-time $end_time --period 300 --region $region --output text | awk '{ print $2 }')
# custom handler
# In this example, the health check will fail if all data points fall below
# my threshold. The health check will not fail if there is no data.
healthy=0
hasdata=0
THRESHOLD=300000
for i in $datapoints; do
    # In this case, the metric(NetworkIn) is not integer.
    if (( $(echo "$i $THRESHOLD" | awk '{print ($1 > $2)}') )); then
        healthy=1
    fi
    hasdata=1
done
if [ $hasdata -eq 1 ]; then
    if [ $healthy -eq 0 ]; then
        aws autoscaling set-instance-health --instance-id $instance_id --health-status Unhealthy --region $region
    fi
fi

其余的部分

使脚本定期运行

$ chmod +x /root/my-health-check.sh
# run the script at 0, 5, 10, 15 ... 55 of every hour
$ echo "*/5 * * * * root /root/my-health-check.sh 2>&1 | /usr/bin/logger -t ec2_health_check" >> /etc/crontab

关闭实例并创建 AMI。完成后，使用该 AMI 创建一个新自动扩展组。现在，如果指标不满足健康条件，它应该自行终止并启动一个新组。瞧！

参考：

[1]：EC2 实例元数据

[2]：获取 AWS 中的当前区域 - StackOverflow

[3]：CloudWatch - 获取指标统计信息

准备

答案1

答案2

准备

/root/我的健康检查.sh

其余的部分

参考：

相关内容