我有一个文本文件,其输出如下:
file_0108.json
2023-02-22T01:15:05.531+0000 connected to: mongodb://[**REDACTED**]@localhost
2023-02-22T01:15:08.531+0000 [######..................] db.coll 64.7MB/255MB (25.4%)
2023-02-22T01:15:11.531+0000 [############............] db.coll 128MB/255MB (50.3%)
2023-02-22T01:15:14.531+0000 [##################......] db.coll 196MB/255MB (76.9%)
2023-02-22T01:15:17.286+0000 [########################] db.coll 255MB/255MB (100.0%)
2023-02-22T01:15:17.286+0000 380757 document(s) imported successfully. 0 document(s) failed to import.
文件编号(块的开头从 0000 - 1000。并非所有文件都已成功导入。如何找到文件中以文件名开头并以以下内容结尾的每个文本块:
xxxxx document(s) imported successfully. 0 document(s) failed to import
然后删除它们,只留下错误?
每个块的文件名和块末尾之间可以有不同的行数。
有些块有错误,但错误可能不同,所以我认为删除没有错误的块会更容易。
错误块示例:
file_0293.json
2023-02-22T01:52:15.303+0000 connected to: mongodb://[**REDACTED**]@localhost
2023-02-22T01:52:16.836+0000 Failed: error processing document #46401: invalid character ',' after object key
2023-02-22T01:52:16.836+0000 46000 document(s) imported successfully. 0 document(s) failed to import.
答案1
如果有每个块内没有空行文本,那么您可以sed
在每一行之后插入一个空行imported successfully
,然后在“段落”(由一个或多个空行分隔的文本块)中处理文件。例如:
sed -e $'/imported successfully/a\\\n' filename |
perl -00 -n -e 'print if /Failed:/'
另外,您在评论中提到您的输入文件是由for
运行的 bash 循环生成的echo <filename> && mongoimport
。我建议您将其更改为运行echo <filename> && mongoimport ; echo
,以便将来的运行已经将其输出分成段落。 sed
不再需要插入换行符,因此您可以运行:
perl -00 -n -e 'print if /Failed:/' filename
答案2
我尝试使用以下带有输出的文本文件,
file_0108.json
2023-02-22T01:15:05.531+0000 connected to: mongodb://[**REDACTED**]@localhost
2023-02-22T01:15:08.531+0000 [######..................] db.coll 64.7MB/255MB (25.4%)
2023-02-22T01:15:11.531+0000 [############............] db.coll 128MB/255MB (50.3%)
2023-02-22T01:15:14.531+0000 [##################......] db.coll 196MB/255MB (76.9%)
2023-02-22T01:15:17.286+0000 [########################] db.coll 255MB/255MB (100.0%)
2023-02-22T01:15:17.286+0000 380757 document(s) imported successfully. 0 document(s) failed to import.
file_0293.json
2023-02-22T01:52:15.303+0000 connected to: mongodb://[**REDACTED**]@localhost
2023-02-22T01:52:16.836+0000 Failed: error processing document #46401: invalid character ',' after object key
2023-02-22T01:52:16.836+0000 Failed: error processing document #46427: invalid character ',' after object key
2023-02-22T01:52:16.836+0000 46000 document(s) imported successfully. 0 document(s) failed to import.
下面的命令行产生了我认为有用的输出到终端。
$ grep -e 'file_.*\.json' -e 'Failed:' file.txt | sed 's/json/json:/'|grep -B1 'Failed:'
file_0293.json:
2023-02-22T01:52:16.836+0000 Failed: error processing document #46401: invalid character ',' after object key
2023-02-22T01:52:16.836+0000 Failed: error processing document #46427: invalid character ',' after object key
如果您愿意,可以将其重定向到一个文件,例如这样以确保将输出打印到标准输出和错误输出... > errors.txt 2>&1
,,
grep -e 'file_.*\.json' -e 'Failed:' file.txt | sed 's/json/json:/'|grep -B1 'Failed:' > errors.txt 2>&1
答案3
使用awk
:
awk -v startblock='^file_[0-9][0-9][0-9][0-9]\\.json$' \
-v endblock='document\\(s\\) failed to import\\.$' '
$0 ~ startblock {
error=0
s=""
}
{
s=(s=="" ? "" : s ORS) $0
}
$0 ~ endblock && (error || $0 !~ " 0 " endblock) {
print s
next
}
tolower($0) ~ /failed|error|invalid/ {
error=1
}
' file
这将打印包含不区分大小写的匹配的所有块failed
,error
或invalid
在块的开始和结束之间或块行的末尾包含非零的n document(s) failed to import.
位置。n
答案4
要使用任何 awk 执行您要求的操作,请执行以下操作:
awk '
/^file_[0-9]+\.json$/ {
printf "%s", rec
rec = ""
}
{ rec = rec $0 ORS }
/document\(s) imported successfully. 0 document\(s) failed to import/ {
rec = ""
}
END { printf "%s", rec }
' file
但您发布的示例输入与您的要求不符。我认为你可能真正想要的是(再次使用任何 awk):
awk '
/^file_[0-9]+\.json$/ {
if ( !bad ) printf "%s", rec
rec = bad = ""
}
/Failed:/ { bad = 1 }
{ rec = rec $0 ORS }
END { if ( !bad ) printf "%s", rec }
' file