解析HTTP访问日志,以获得一秒内响应429的所有请求

解析HTTP访问日志,以获得一秒内响应429的所有请求

来自 nginx 的典型 access.log 文件

000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 1157 "data..."
000.00.000.002 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 200 741 "-" "data..."
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."

问题是我怎样才能从日志文件中获取响应代码为 429 且随时在一秒内生成的所有 IP 地址。我正在尝试使用 awk 找到解决方案,但如果有人能给提示的话,还没有成功。给定示例的输出将是:

28/Jun/2021:06:37:02:
000.00.000.001
28/Jun/2021:06:37:03:
000.00.000.003
  1. 仅发出多于或等于 5 个请求的 IP
  2. 有响应状态 429
  3. 如果任何时间而不是特定的秒有响应显示,则按秒分组

答案1

这就是你想做的吗?

$ awk -F'[[ ]+' '$9==429{print $4, $1}' file | uniq -c | awk '$1>4{print $2 ":\n" $3}'
28/Jun/2021:06:37:02:
000.00.000.001
28/Jun/2021:06:37:03:
000.00.000.003

如果第一组引号(例如 )中的内容"POST /abc/cba/ HTTP/1.1"并不总是像示例输入中那样由 3 个空格分隔的字符串组成,那么只需将其调整为:

$ awk -F'[[ ]+' '{sub(/"[^"]*"/,"")} $6==429{print $4, $1}' file | uniq -c | awk '$1>4{print $2 ":\n" $3}'
28/Jun/2021:06:37:02:
000.00.000.001
28/Jun/2021:06:37:03:
000.00.000.003

如果您出于某种原因更喜欢仅使用 awk 的解决方案:

$ awk -F'[[ ]+' '$9==429{cnt[$4":\n"$1]++} END{for (key in cnt) if (cnt[key]>4) print key}' file
28/Jun/2021:06:37:02:
000.00.000.001
28/Jun/2021:06:37:03:
000.00.000.003

上述所有脚本只需在每个 Unix 机器上的任何 shell 中使用强制 POSIX 工具即可工作。

相关内容