如何从文件中捕获两个字符串之间的行,但仅捕获最后一次出现的行?

如何从文件中捕获两个字符串之间的行,但仅捕获最后一次出现的行?

我有一个由脚本输出的日志文件,该日志文件每天轮换一次。它将包含字符串

Transfer started at timestamp 

Transfer completed successfully at timestamp

反复进行,因为上述转账将每小时进行一次。时间戳将事先用 创建date

  • 我想将这两个字符串的最后一个实例以及其间的所有内容捕获到一个单独的文件中。
  • 如果在日志文件末尾附近发现了开始的字符串,并且没有后续的完成的字符串,那么我想要捕获直到 EOF 的所有内容并输出一条错误消息以说明未找到结束字符串。

我猜我需要使用sed或,awk但我对它们真的不熟悉。我想在 bash 脚本中使用该命令,并了解每个部分的作用,因此一些解释将非常有用。

日志文件块示例:

ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer completed successfully at Fri May 27 14:05:16 BST 2016
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2016
...

期望的输出:

Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer completed successfully at Fri May 27 14:05:16 BST 2016

但是,如果日志文件如下:

ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.

我想要输出:

Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file

答案1

当我听到“我想用最后的文件中的某些内容”,我认为:

  • 反转文件
  • 做 X 与第一的文件中的内容
  • 反转 X 的输出

在代码中:

tac logfile | awk '
    BEGIN {text = "ERROR: transfer not complete by end of log file"}
    /^Transfer completed successfully/ {text = ""}
    {text = text ORS $0}
    /^Transfer started at / {print text; exit}
' | tac

由于我们是从下往上读取日志文件的,所以我首先假设传输尚未完成。如果我看到“传输已完成”消息,我们可以丢弃到目前为止捕获的任何内容。我们保存每一行。当我们看到“传输已开始”行时,我们知道我们已经看到了所有的最后的在文件中传输:打印出(反转的)捕获的文本并退出 awk。

答案2

只需使用 Python。我真的没有时间,但我将从这个开始:

#!/usr/bin/env python

start = "Transfer started at"
end = "Transfer completed successfully"
buffer = ""
log = False

for line in open('logfile.log'):
  if line.startswith(start):
    buffer = line
    log = True
  elif line.startswith(end):
    buffer += line
    log = False
  elif log:
    buffer += line

open('output.log', 'w').write(buffer)

if log == True:
  print("End string was not found")

答案3

Python 的所有东西,但让正则表达式为您完成工作!

将下面的脚本粘贴到任何文件中,例如,logfilter.py并使用命令使其可执行chmod +x logfilter.py

然后您可以像这样运行它,假设它位于当前目录中:

./logfilter.py logfile.txt

这将使其处理该文件logfile.txt

但是,如果您不向其传递任何命令行参数,它将等待标准输入上的数据。这意味着您也可以将数据导入其中。以下示例处理来自剪贴板的数据(需要xsel安装才能访问剪贴板):

xsel -ob | ./logfilter.py

剧本:

#! /usr/bin/env python3

p_start = r'^Transfer started at .*?$'
p_end   = r'^Transfer completed successfully at .*?$'

error_no_match = 'ERROR: no match found'
error_no_end   = 'ERROR: transfer not complete by end of log file'

pattern = r'{p0}(?!.*{p0})(?:.*?{p1}|.*)'.format(p0=p_start, p1=p_end)

import sys, re
if len(sys.argv) > 1:
    with open(sys.argv[1]) as f:
        text = f.read()
else:
    text = sys.stdin.read()

matches = re.findall(pattern, text, re.DOTALL | re.MULTILINE)
if matches:
    last_match = matches[-1]
    print(last_match)
    if not re.search(p_end, last_match, re.DOTALL | re.MULTILINE):
        print(error_no_end)
else:
    print(error_no_match)

答案4

您可以使用带有切换开关的 awk 数组来缓冲最新的块,如果切换开关在最后仍设置,则打印错误文本(我认为这本质上是@anatoly_techtonik 的 python 答案的 awk 实现):

awk '
  BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"}

  /Transfer started/ {inblock=1; delete a;}
  /Transfer completed/ {a[FNR]=$0; inblock=0;}

  inblock == 1 {a[FNR]=$0}

  END {
    for (i in a) print a[i]; 
    if (inblock) 
      print "ERROR: transfer not complete by end of log file"
  }
' logfile

相关内容