再生产

Question

简而言之，当前的 Elastic Beanstalk logrotation 配置似乎已损坏，导致服务停机，504网关超时。让我们来看看。

再生产

我们创建最简单的 Python WSGI 应用程序。

应用程序.py

import time


def application(environ, start_response):
    # somewhat realistic response duration
    time.sleep(0.5)

    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [b'Hello world!\n']

将其压缩至应用程序.zip。然后创建 Elastic Beanstalk Python 应用程序和环境，上传存档。确保使用您拥有的密钥对。保留其他设置默认。等待完成（几分钟）。

ssh进入底层 EC2 实例（参见 EB 日志中的实例标识符）。类型（httpdlogrotate 后操作，见下文）：

sudo /sbin/service httpd reload

然后在你的机器上运行：

siege -v -b -c 10 -t 10S http://your-test-eb.you-aws-region.elasticbeanstalk.com/

在它运行时，重复重新加载命令几次。

然后你会看到类似下面的内容：

** SIEGE 3.0.8
** Preparing 10 concurrent users for battle.
The server is now under siege...
HTTP/1.1 200   0.63 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.65 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.64 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.60 secs:      13 bytes ==> GET  /
...

当您时会发生以下情况reload。

HTTP/1.1 504   0.06 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.07 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.08 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.10 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.11 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.66 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.19 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.20 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.09 secs:       0 bytes ==> GET  /

然后它就恢复了。

HTTP/1.1 200   1.25 secs:      13 bytes ==> GET  /
HTTP/1.1 200   1.24 secs:      13 bytes ==> GET  /
HTTP/1.1 200   1.26 secs:      13 bytes ==> GET  /
...

Lifting the server siege..      done.

Transactions:                 75 hits
Availability:              81.52 %
Elapsed time:               9.40 secs
Data transferred:           0.00 MB
Response time:              1.21 secs
Transaction rate:           7.98 trans/sec
Throughput:             0.00 MB/sec
Concurrency:                9.68
Successful transactions:      75
Failed transactions:          17
Longest transaction:        4.27
Shortest transaction:       0.06

请注意，ELB 似乎对问题没有任何影响，并且可以通过与底层 EC2 和（Amazon AMI 没有siege）的两个 SSH 会话重现相同的问题：

ab -v 4 -c 10 -t 10 http://your-test-eb.you-aws-region.elasticbeanstalk.com/

原因

/etc/cron.hourly/cron.logrotate.elasticbeanstalk.httpd.conf

#!/bin/sh
test -x /usr/sbin/logrotate || exit 0
/usr/sbin/logrotate /etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/var/log/httpd/* {
size 10M
missingok
notifempty
rotate 5
sharedscripts
compress
dateext
dateformat -%s
create
postrotate
    /sbin/service httpd reload > /dev/null 2>/dev/null || true
endscript
olddir /var/log/httpd/rotated
}

注意postrotate./sbin/service只是中脚本的 System V 包装器/etc/init.d/。其手册页显示：

service 在尽可能可预测的环境中运行 System V init 脚本，删除大多数环境变量并将当前工作目录设置为/。

注意reload不是标准的 Apache维护命令。这是发行版的下游添加。让我们看看 init 脚本，/etc/init.d/httpd.相关部分如下：

reload() {
        echo -n $"Reloading $prog: "
        check13 || exit 1
        killproc -p ${pidfile} $httpd -HUP
        RETVAL=$?
        echo
}

正如你所见它发送HUP向 Apache 发送信号，该信号被解释为现在重启：

向父进程发送HUP或restart信号会导致其像中一样终止其子进程TERM，但父进程不会退出。它会重新读取其配置文件，并重新打开所有日志文件。然后它会生成一组新的子进程并继续提供命中服务。

TERM很好地解释了 504 错误。但应该怎么做呢？优雅重启，因为它还会重新打开日志，但不会终止正在处理的请求：

USR1或信号graceful使父进程建议子进程在处理完当前请求后退出（如果子进程没有提供任何服务，则立即退出）。父进程重新读取其配置文件并重新打开其日志文件。当每个子进程死亡时，父进程会用新一代配置中的子进程替换它，该子进程会立即开始处理新请求。

...

编写代码的目的是既要尽量减少服务器无法处理新请求的时间（操作系统会将它们排队，因此无论如何都不会丢失），又要尊重您的调整参数。

解决方法

可以使用.ebextensions取代/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf。在根目录中创建 .ebextensions/10_logs.config内容如下（基本上将“reload”替换为“graceful”）：

files:
    "/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf":
        mode: "000644"
        owner: root
        group: root
        content: |
            /var/log/httpd/* {
                size 10M
                missingok
                notifempty
                rotate 5
                sharedscripts
                compress
                dateext
                dateformat -%s
                create
                postrotate
                    /sbin/service httpd graceful > /dev/null 2>/dev/null || true
                endscript
                olddir /var/log/httpd/rotated
            }

然后重新部署 Elastic Beanstalk 环境。请注意，通过随后的亚秒级优雅重启，我能够（偶尔）生成503服务不可用，但是，对于日志轮换来说情况并非如此，因为均匀间隔的平滑重启不会出现错误。

Answer 1

简而言之，当前的 Elastic Beanstalk logrotation 配置似乎已损坏，导致服务停机，504网关超时。让我们来看看。

再生产

我们创建最简单的 Python WSGI 应用程序。

应用程序.py

import time


def application(environ, start_response):
    # somewhat realistic response duration
    time.sleep(0.5)

    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [b'Hello world!\n']

将其压缩至应用程序.zip。然后创建 Elastic Beanstalk Python 应用程序和环境，上传存档。确保使用您拥有的密钥对。保留其他设置默认。等待完成（几分钟）。

ssh进入底层 EC2 实例（参见 EB 日志中的实例标识符）。类型（httpdlogrotate 后操作，见下文）：

sudo /sbin/service httpd reload

然后在你的机器上运行：

siege -v -b -c 10 -t 10S http://your-test-eb.you-aws-region.elasticbeanstalk.com/

在它运行时，重复重新加载命令几次。

然后你会看到类似下面的内容：

** SIEGE 3.0.8
** Preparing 10 concurrent users for battle.
The server is now under siege...
HTTP/1.1 200   0.63 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.65 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.64 secs:      13 bytes ==> GET  /
HTTP/1.1 200   0.60 secs:      13 bytes ==> GET  /
...

当您时会发生以下情况reload。

HTTP/1.1 504   0.06 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.07 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.08 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.10 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.11 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.66 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.19 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.20 secs:       0 bytes ==> GET  /
HTTP/1.1 504   0.09 secs:       0 bytes ==> GET  /

然后它就恢复了。

HTTP/1.1 200   1.25 secs:      13 bytes ==> GET  /
HTTP/1.1 200   1.24 secs:      13 bytes ==> GET  /
HTTP/1.1 200   1.26 secs:      13 bytes ==> GET  /
...

Lifting the server siege..      done.

Transactions:                 75 hits
Availability:              81.52 %
Elapsed time:               9.40 secs
Data transferred:           0.00 MB
Response time:              1.21 secs
Transaction rate:           7.98 trans/sec
Throughput:             0.00 MB/sec
Concurrency:                9.68
Successful transactions:      75
Failed transactions:          17
Longest transaction:        4.27
Shortest transaction:       0.06

请注意，ELB 似乎对问题没有任何影响，并且可以通过与底层 EC2 和（Amazon AMI 没有siege）的两个 SSH 会话重现相同的问题：

ab -v 4 -c 10 -t 10 http://your-test-eb.you-aws-region.elasticbeanstalk.com/

原因

/etc/cron.hourly/cron.logrotate.elasticbeanstalk.httpd.conf

#!/bin/sh
test -x /usr/sbin/logrotate || exit 0
/usr/sbin/logrotate /etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/var/log/httpd/* {
size 10M
missingok
notifempty
rotate 5
sharedscripts
compress
dateext
dateformat -%s
create
postrotate
    /sbin/service httpd reload > /dev/null 2>/dev/null || true
endscript
olddir /var/log/httpd/rotated
}

注意postrotate./sbin/service只是中脚本的 System V 包装器/etc/init.d/。其手册页显示：

service 在尽可能可预测的环境中运行 System V init 脚本，删除大多数环境变量并将当前工作目录设置为/。

注意reload不是标准的 Apache维护命令。这是发行版的下游添加。让我们看看 init 脚本，/etc/init.d/httpd.相关部分如下：

reload() {
        echo -n $"Reloading $prog: "
        check13 || exit 1
        killproc -p ${pidfile} $httpd -HUP
        RETVAL=$?
        echo
}

正如你所见它发送HUP向 Apache 发送信号，该信号被解释为现在重启：

向父进程发送HUP或restart信号会导致其像中一样终止其子进程TERM，但父进程不会退出。它会重新读取其配置文件，并重新打开所有日志文件。然后它会生成一组新的子进程并继续提供命中服务。

TERM很好地解释了 504 错误。但应该怎么做呢？优雅重启，因为它还会重新打开日志，但不会终止正在处理的请求：

USR1或信号graceful使父进程建议子进程在处理完当前请求后退出（如果子进程没有提供任何服务，则立即退出）。父进程重新读取其配置文件并重新打开其日志文件。当每个子进程死亡时，父进程会用新一代配置中的子进程替换它，该子进程会立即开始处理新请求。

...

编写代码的目的是既要尽量减少服务器无法处理新请求的时间（操作系统会将它们排队，因此无论如何都不会丢失），又要尊重您的调整参数。

解决方法

可以使用.ebextensions取代/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf。在根目录中创建 .ebextensions/10_logs.config内容如下（基本上将“reload”替换为“graceful”）：

files:
    "/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf":
        mode: "000644"
        owner: root
        group: root
        content: |
            /var/log/httpd/* {
                size 10M
                missingok
                notifempty
                rotate 5
                sharedscripts
                compress
                dateext
                dateformat -%s
                create
                postrotate
                    /sbin/service httpd graceful > /dev/null 2>/dev/null || true
                endscript
                olddir /var/log/httpd/rotated
            }

然后重新部署 Elastic Beanstalk 环境。请注意，通过随后的亚秒级优雅重启，我能够（偶尔）生成503服务不可用，但是，对于日志轮换来说情况并非如此，因为均匀间隔的平滑重启不会出现错误。

再生产

答案1

再生产

原因

解决方法

相关内容