从日志文件获取数据

从日志文件获取数据

我想从以下日志条目中获取内存使用情况。它是 URL 中紧跟在 200 后面的数字。我想首先获取内存使用率最高的列表,例如前 10 名。我想我会使用 grep 来做这件事,对吗?

178.0.140.206 - - [05/Nov/2010:16:46:09 -0400] "GET /image/promo/terran-88x31.jpg HTTP/1.1" 200 15227 0 -
79.66.101.95 - - [05/Nov/2010:16:46:09 -0400] "GET /strategy/article/view/?id=608 HTTP/1.1" 200 8456 0 4980736
79.66.101.95 - - [05/Nov/2010:16:46:10 -0400] "GET /lib/php/min/?f=lib/css/yui/2.7.0.css,lib/css/base.css,lib/css/ux/rating.css,lib/css/page/strategy.css,lib/css/page/article.css,lib/css/page/strategy/article.css HTTP/1.1" 200 8118 0 1835008
79.66.101.95 - - [05/Nov/2010:16:46:11 -0400] "GET /image/logo-text.png HTTP/1.1" 200 9444 0 -
79.66.101.95 - - [05/Nov/2010:16:46:11 -0400] "GET /image/s.gif HTTP/1.1" 200 43 0 -
79.66.101.95 - - [05/Nov/2010:16:46:11 -0400] "GET /image/logo.png HTTP/1.1" 200 17722 0 -
79.66.101.95 - - [05/Nov/2010:16:46:13 -0400] "GET /lib/php/min/?f=lib/js/ext/3.0-core.js,lib/js/global.js,lib/js/ext/ux/rating.js,lib/js/page/article.js HTTP/1.1" 200 32919 0 1310720
79.66.101.95 - - [05/Nov/2010:16:46:16 -0400] "GET /lib/css/resource/body-bg.png HTTP/1.1" 200 467 0 -
79.66.101.95 - - [05/Nov/2010:16:46:16 -0400] "GET /lib/css/resource/foot-bg.png HTTP/1.1" 200 119 0 -
79.66.101.95 - - [05/Nov/2010:16:46:16 -0400] "GET /lib/css/resource/search-bg-sprite.png HTTP/1.1" 200 280 0 -
190.213.177.71 - - [05/Nov/2010:16:46:16 -0400] "GET /images/banner/dark-templar_firefox.gif HTTP/1.1" 404 2827 0 1572864

答案1

假设您也希望访问 URL(您可以awk根据需要调整打印语句以获取更多字段):

awk '{ print $10,$7 }' PATH_TO_LOG_FILE | sort -k1 -rn | head -n10

要仅将其用于特定的 HTTP 代码(在本例中为 200):

awk '{ if($9=="200") {print $10,$7} }' PATH_TO_LOG_FILE | sort -k1 -rn | head -n10

或者使用正则表达式检查多个错误代码:

awk '{ if($9~"^200|403|404$") {print $10,$7} }' PATH_TO_LOG_FILE | sort -k1 -rn | head -n10

如果您计划重复运行此操作,请考虑查看 CustomLog。

答案2

我知道这些数字反映了返回的内容大小。无论如何,您可以使用以下命令获取所需的列(200 之后):

grep "1.1\" 200 " logfile | awk {'print $10'} | sort -nr | head -n 10

相关内容