为什么 grep 停止匹配并且二进制文件匹配出错

为什么 grep 停止匹配并且二进制文件匹配出错

问题

为什么我的 grep 命令匹配某些行然后因此错误而停止 grep: /var/log/apache2/modsec_audit.log: 二进制文件匹配

我的 grep 命令:grep '^[' /var/log/apache2/modsec_audit.log

我猜日志文件中有二进制内容,这会弄乱 grep?但这只是我的猜测,所以请解释一下原因。另外请解释一下我可以做些什么来解决这个问题。

日志文件本身是 ASCII 文本。我可以使用 less 来读取文件 /var/log/apache2/modsec_audit.log /var/log/apache2/modsec_audit.log: HTML 文档,ASCII 文本,行很长 (9938)

设置

在我的网络服务器上,我安装了 modsec。审计文件是一个多行记录,描述如下: https://github.com/SpiderLabs/ModSecurity/wiki/ModSecurity-2-Data-Formats#user-content-Parts

仅供参考,我正在记录这些部分 SecAuditLogParts ABCEFHJZ

A 部分是审计日志头,只有一行,上面有以下信息:时间戳、唯一事务 ID、源 IP 地址(IPv4 或 IPv6)、源端口、目标 IP 地址(IPv4 或 IPv6)、目标端口

例如 [05/Jan/2024:00:45:31.734758 +0000] ZZdRKyjPxuLDuK2XVhEfLgAAAAU 198.12.243.17 13914 192.168.2.143 443

我正在尝试做什么

我试图从众多行中 grep 出 A 部分,并将其用作简单参考,了解某一天发生了什么以及机器人活动最繁忙的时间。有很多更好的方法可以做到这一点,但请保持正轨 - grep 出了什么问题,我该如何克服它?

答案1

grep默认情况下不喜欢输出二进制数据(例如,输出二进制数据可能会弄乱终端),因此它默认仅指示binary file matches二进制文件的匹配。

如果您无论如何都想要输出,您可能需要该-a选项。
有关详细信息,请参阅手册中的相关部分:

   -a, --text
          Process a binary file as if it were text; this is equivalent to the --binary-files=text option.

   --binary-files=TYPE
          If  a  file's  data or metadata indicate that the file contains binary data, assume that the file is of
          type TYPE.  Non-text bytes indicate binary data; these are either  output  bytes  that  are  improperly
          encoded for the current locale, or null input bytes when the -z option is not given.

          By  default, TYPE is binary, and grep suppresses output after null input binary data is discovered, and
          suppresses output lines that contain improperly encoded data.  When some  output  is  suppressed,  grep
          follows any output with a message to standard error saying that a binary file matches.

          If  TYPE  is  without-match, when grep discovers null input binary data it assumes that the rest of the
          file does not match; this is equivalent to the -I option.

          If TYPE is text, grep processes a binary file as if it were text; this is equivalent to the -a option.

          When type is binary, grep may treat non-text bytes as line terminators  even  without  the  -z  option.
          This  means choosing binary versus text can affect whether a pattern matches a file.  For example, when
          type is binary the pattern q$ might match q immediately followed by a null byte, even  though  this  is
          not  matched when type is text.  Conversely, when type is binary the pattern . (period) might not match
          a null byte.

          Warning: The -a option might output binary garbage, which can have nasty side effects if the output  is
          a  terminal  and  if  the  terminal  driver interprets some of it as commands.  On the other hand, when
          reading files whose text encodings are unknown, it can be helpful to use -a or to set LC_ALL='C' in the
          environment, in order to find more matches even if the matches are unsafe for direct display.

相关内容