我是一名数据科学家/机器学习开发人员。有时,我必须通过提供端点来公开我的模型。我通常通过 Flask 和 gunicorn 来做到这一点:
exampleproject.py
:
import random
from flask import Flask
app = Flask(__name__)
random.seed(0)
@app.route("/")
def hello():
x = random.randint(1, 100)
y = random.randint(1, 100)
return str(x * y)
if __name__ == "__main__":
app.run(host='0.0.0.0')
wsgi.py
:
from exampleproject import app
if __name__ == "__main__":
app.run()
由......运营
$ gunicorn --bind 0.0.0.0:5000 wsgi:app
当我对这个简单的脚本进行基准测试时,我得到:
$ ab -s 30 -c 200 -n 25000 -v 1 http://localhost:5000/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 2500 requests
Completed 5000 requests
Completed 7500 requests
Completed 10000 requests
Completed 12500 requests
Completed 15000 requests
Completed 17500 requests
Completed 20000 requests
Completed 22500 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 24941 requests completed
由于总请求数较少,所以看起来不错:
$ ab -l -s 30 -c 200 -n 200 -v 1 http://localhost:5000/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
Server Software: gunicorn/19.9.0
Server Hostname: localhost
Server Port: 5000
Document Path: /
Document Length: Variable
Concurrency Level: 200
Time taken for tests: 0.084 seconds
Complete requests: 200
Failed requests: 0
Total transferred: 32513 bytes
HTML transferred: 713 bytes
Requests per second: 2380.19 [#/sec] (mean)
Time per request: 84.027 [ms] (mean)
Time per request: 0.420 [ms] (mean, across all concurrent requests)
Transfer rate: 377.87 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 1.2 2 3
Processing: 1 36 16.8 41 52
Waiting: 1 36 16.8 41 52
Total: 4 37 15.8 43 54
Percentage of the requests served within a certain time (ms)
50% 43
66% 51
75% 51
80% 52
90% 52
95% 52
98% 53
99% 53
100% 54 (longest request)
我可以改变一些东西来改善我的工作量的配置吗?
当我仅执行一次真实模型调用时,我会在 0.5 秒内看到答案。我认为执行时间最多为 1.0 秒是合理的。每次调用都是无状态的,这意味着每次调用都应该独立于其他调用。
当我尝试分析这个问题时,我看到了很多TIME_WAIT
:
$ netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n
1 established)
1 Foreign
2 CLOSE_WAIT
4 LISTEN
10 SYN_SENT
60 SYN_RECV
359 ESTABLISHED
13916 TIME_WAIT
我如何确认/证伪这是问题所在?这与 Flask/gunicorn 有任何关系吗?nginx 与 gunicorn 有何关系?