AWK打印两个模式之间的行,只需要打印最后一次出现的匹配行

AWK打印两个模式之间的行,只需要打印最后一次出现的匹配行

我想过滤日志文件并打印两个匹配之间的一些行并仅打印最后一个匹配项。

示例文件内容:

2023-03-08 11:12:44,306 - Code Deploy - INFO - Received signal
2023-03-08 11:12:44,306 - Code Deploy - INFO - Received message signal
2023-03-08 11:12:44,306 - Code Deploy - INFO - Branch is Testing
2023-03-08 11:12:44,307 - Code Deploy - INFO - Deployment started
2023-03-08 11:13:31,782 - Code Deploy - INFO - Old version2_0_5_12
2023-03-08 11:13:31,783 - Code Deploy - INFO - New version2_0_5_13
2023-03-08 11:13:32,553 - Code Deploy - INFO - Permission fixed
2023-03-08 11:13:32,554 - Code Deploy - INFO - Deployment finished
2023-03-08 11:13:34,900 - Code Deploy - ERROR - !!!!!!!!!! EXCEPTION !!!!!!!!!(535, b'5.7.8     Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials z16-20020a170903019000b0019a97a4324dsm9818181plg.5 - gsmtp')Traceback (most recent call last):
File "/root/code-dployment/server/deploy.py", line 94, in send_email
server.login(gmail_user, gmail_password)
File "/usr/lib/python3.5/smtplib.py", line 729, in login
raise last_exception
File "/usr/lib/python3.5/smtplib.py", line 720, in login
initial_response_ok=initial_response_ok)
File "/usr/lib/python3.5/smtplib.py", line 641, in auth
raise SMTPAuthenticationError(code, resp)
smtplib.SMTPAuthenticationError: (535, b'5.7.8 Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials z16-20020a170903019000b0019a97a4324dsm9818181plg.5 - gsmtp')

2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished
2023-03-09 11:54:00,797 - Code Deploy - ERROR - !!!!!!!!!! EXCEPTION !!!!!!!!!(535, b'5.7.8 Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials k17-20020aa790d1000000b005907716bf8bsm11097506pfk.60 - gsmtp')Traceback (most recent call last):
File "/root/code-dployment/server/deploy.py", line 94, in send_email
server.login(gmail_user, gmail_password)
File "/usr/lib/python3.5/smtplib.py", line 729, in login
raise last_exception
File "/usr/lib/python3.5/smtplib.py", line 720, in login
initial_response_ok=initial_response_ok)
File "/usr/lib/python3.5/smtplib.py", line 641, in auth
raise SMTPAuthenticationError(code, resp)
smtplib.SMTPAuthenticationError: (535, b'5.7.8 Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials k17-20020aa790d1000000b005907716bf8bsm11097506pfk.60 - gsmtp')

它需要获取两个模式之间的内容。

Pattern1 = '接收到的信号'

Pattern2 = '部署完成'

预期结果:

2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished

我想要一个 AWK 命令来在 bash 脚本中使用它。我找到了使用以下命令来过滤两种模式之间的内容的解决方案:

# awk '/Received signal/,/Deployment finished/' /tmp/result.log

它将打印所有出现的整个匹配行,但是,我需要对其进行过滤,以便它只打印匹配模式的最后一次出现。

上述命令的输出是:

2023-03-08 11:12:44,306 - Code Deploy - INFO - Received signal
2023-03-08 11:12:44,306 - Code Deploy - INFO - Received message signal
2023-03-08 11:12:44,306 - Code Deploy - INFO - Branch is Testing
2023-03-08 11:12:44,307 - Code Deploy - INFO - Deployment started
2023-03-08 11:13:31,782 - Code Deploy - INFO - Old version2_0_5_12
2023-03-08 11:13:31,783 - Code Deploy - INFO - New version2_0_5_13
2023-03-08 11:13:32,553 - Code Deploy - INFO - Permission fixed
2023-03-08 11:13:32,554 - Code Deploy - INFO - Deployment finished
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished

答案1

使用任何 awk 并且与中的脚本之一非常相似@terdon的回答但在我看来,使用 awk 的condition { action }主体结构有点更惯用:

$ awk '
    /Received signal/ { f=1; rec="" }
    f { rec = rec $0 ORS }
    /Deployment finished/ { f=0 }
    END { if (f=="0") printf "%s", rec }
' file
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished

这个和 @terdon 的答案之间的细微功能差异是:

  1. 万一您决定将 ORS 设置为与 RS 不同的值(例如,您可能想转换RS='\r\n'ORS='\n'),这将产生所需的记录终止符,而 @terdon 将使用 ORS 在大多数输出​​中重现 RS 值在它的最后。
  2. 如果输入文件不包含任何行,@terdon 会打印一个空行,Received signal而这个文件不会产生任何输出。
  3. 如果输入中存在两个定界符,则此命令只会打印定界符之间的文本,而 @terdon 会打印一行后面的任何内容,Received signal即使不Deployment finished存在后续行。

关于awk '/Received signal/,/Deployment finished/' /tmp/result.log您的问题 - 不要使用范围表达式,使用标志,请参阅is-a-start-end-range-expression-ever-useful-in-awk。正如您在迄今为止发布的每个使用范围表达式的答案中所看到的,它需要对相同的条件进行两次测试。

答案2

反转文件,删除不在范围内的所有行/Deployment finished/,/Received signal/,并在匹配该行范围的末尾时立即退出。然后反转结果。

tail -r file |
sed -e '/Deployment finished/,/Received signal/!d' -e '/Received signal/q' |
tail -r

GNU 系统用户可能想用它tac来代替tail -r上面的。

如果您觉得需要使用awk,请将sed上面管道中的命令替换为

awk '/Deployment finished/,/Received signal/; /Received signal/ { exit }'

输出,给出问题中的数据:

2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished

答案3

一个简单的技巧是将每个匹配项保存在一个变量中,覆盖之前的内容,然后在脚本末尾打印该变量:

$ awk '{ 
         if(/Received signal/){k=1; v=$0} 
         else if(k==1){
           v=v RS $0; 
           if(/Deployment finished/){ k=0 }
         }
       } 
       END{ print v }' result.log
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished

或者您可以使用tac反转文件,然后打印第一个匹配项:

$ tac result.log | 
   awk '/Deployment finished/,/Received signal/{
      print; 
      if(/Received signal/){ exit }
   }' | tac
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished

答案4

使用(以前称为 Perl_6)

~$ raku -e 'my $k; my @v; for lines() { 
            if /Received \s signal/ {$k = 1; @v = $_} 
            elsif ($k == 1) { @v.push: $_ }; 
            if /Deployment \s finished/ {$k = 0}
            }; .put for @v;'  file
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished

上面的代码无耻地遵循了@terdon的优秀awk代码,但是用Raku重写了。 (@terdon 指出)的关键是@v每次/Received \s signal/发现打开的正则表达式时覆盖“存储变量”(在本例中为数组)。这里,因为lines都一一加载到topic变量中,所以使用了$_代码。@v = $_

如果在到达文件末尾时,您不希望有任何返回,除非/Deployment \s finished/最后看到结束(即“完整记录”要求),然后将最终的输出put语句更改为:

.put if $k == 0 for @v;

#OR

.put unless $k == 1 for @v;

https://docs.raku.org/language/control.html#Control_flow
https://raku.org

相关内容