查找两行之间行数未知的行

查找两行之间行数未知的行

我有一个文本文件,其输出如下:

file_0108.json
2023-02-22T01:15:05.531+0000    connected to: mongodb://[**REDACTED**]@localhost
2023-02-22T01:15:08.531+0000    [######..................] db.coll  64.7MB/255MB (25.4%)
2023-02-22T01:15:11.531+0000    [############............] db.coll  128MB/255MB (50.3%)
2023-02-22T01:15:14.531+0000    [##################......] db.coll  196MB/255MB (76.9%)
2023-02-22T01:15:17.286+0000    [########################] db.coll  255MB/255MB (100.0%)
2023-02-22T01:15:17.286+0000    380757 document(s) imported successfully. 0 document(s) failed to import.

文件编号(块的开头从 0000 - 1000。并非所有文件都已成功导入。如何找到文件中以文件名开头并以以下内容结尾的每个文本块:

xxxxx document(s) imported successfully. 0 document(s) failed to import

然后删除它们,只留下错误?
每个块的文件名和块末尾之间可以有不同的行数。
有些块有错误,但错误可能不同,所以我认为删除没有错误的块会更容易。

错误块示例:

file_0293.json  
2023-02-22T01:52:15.303+0000    connected to: mongodb://[**REDACTED**]@localhost  
2023-02-22T01:52:16.836+0000    Failed: error processing document #46401: invalid character ',' after object key  
2023-02-22T01:52:16.836+0000    46000 document(s) imported successfully. 0 document(s) failed to import.

答案1

如果有每个块内没有空行文本,那么您可以sed在每一行之后插入一个空行imported successfully,然后在“段落”(由一个或多个空行分隔的文本块)中处理文件。例如:

sed -e $'/imported successfully/a\\\n' filename |
  perl -00 -n -e 'print if /Failed:/'

另外,您在评论中提到您的输入文件是由for运行的 bash 循环生成的echo <filename> && mongoimport。我建议您将其更改为运行echo <filename> && mongoimport ; echo,以便将来的运行已经将其输出分成段落。 sed不再需要插入换行符,因此您可以运行:

perl -00 -n -e 'print if /Failed:/' filename

答案2

我尝试使用以下带有输出的文本文件,

file_0108.json
2023-02-22T01:15:05.531+0000    connected to: mongodb://[**REDACTED**]@localhost
2023-02-22T01:15:08.531+0000    [######..................] db.coll  64.7MB/255MB (25.4%)
2023-02-22T01:15:11.531+0000    [############............] db.coll  128MB/255MB (50.3%)
2023-02-22T01:15:14.531+0000    [##################......] db.coll  196MB/255MB (76.9%)
2023-02-22T01:15:17.286+0000    [########################] db.coll  255MB/255MB (100.0%)
2023-02-22T01:15:17.286+0000    380757 document(s) imported successfully. 0 document(s) failed to import.
file_0293.json  
2023-02-22T01:52:15.303+0000    connected to: mongodb://[**REDACTED**]@localhost  
2023-02-22T01:52:16.836+0000    Failed: error processing document #46401: invalid character ',' after object key  
2023-02-22T01:52:16.836+0000    Failed: error processing document #46427: invalid character ',' after object key  
2023-02-22T01:52:16.836+0000    46000 document(s) imported successfully. 0 document(s) failed to import.

下面的命令行产生了我认为有用的输出到终端。

$ grep -e 'file_.*\.json' -e 'Failed:' file.txt | sed 's/json/json:/'|grep -B1 'Failed:'
file_0293.json:  
2023-02-22T01:52:16.836+0000    Failed: error processing document #46401: invalid character ',' after object key  
2023-02-22T01:52:16.836+0000    Failed: error processing document #46427: invalid character ',' after object key  

如果您愿意,可以将其重定向到一个文件,例如这样以确保将输出打印到标准输出和错误输出... > errors.txt 2>&1,,

grep -e 'file_.*\.json' -e 'Failed:' file.txt | sed 's/json/json:/'|grep -B1 'Failed:' > errors.txt 2>&1

答案3

使用awk

awk -v startblock='^file_[0-9][0-9][0-9][0-9]\\.json$' \
    -v endblock='document\\(s\\) failed to import\\.$' '
    $0 ~ startblock {
        error=0
        s=""
    }
    {
        s=(s=="" ? "" : s ORS) $0
    }
    $0 ~ endblock && (error || $0 !~ " 0 " endblock) {
        print s
        next
    }
    tolower($0) ~ /failed|error|invalid/ {
        error=1
    }
' file

这将打印包含不区分大小写的匹配的所有块failederrorinvalid在块的开始和结束之间或块行的末尾包含非零的n document(s) failed to import.位置。n

答案4

要使用任何 awk 执行您要求的操作,请执行以下操作:

awk '
    /^file_[0-9]+\.json$/ {
        printf "%s", rec
        rec = ""
    }
    { rec = rec $0 ORS }
    /document\(s) imported successfully. 0 document\(s) failed to import/ {
        rec = ""
    }
    END { printf "%s", rec }
' file

但您发布的示例输入与您的要求不符。我认为你可能真正想要的是(再次使用任何 awk):

awk '
    /^file_[0-9]+\.json$/ {
        if ( !bad ) printf "%s", rec
        rec = bad = ""
    }
    /Failed:/ { bad = 1 }
    { rec = rec $0 ORS }
    END { if ( !bad ) printf "%s", rec }
' file

相关内容