从与pattern1最接近的匹配处打印pattern2之前的所有行

从与pattern1最接近的匹配处打印pattern2之前的所有行

我相当确定以前有人问过这个问题,但我找不到确切的骗子。

我的输入如下所示:

Compiling File1
... commands ...

Compiling File2
... commands ...

Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A

Compiling File4
... commands ...

Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B

预期输出:

Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A
---separator---
Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B
---separator---

我想要一个 shell 命令来打印 的每个匹配项error:,从最接近的前面匹配项Compiling到当前行,即错误消息的完整上下文。可以跳过成功编译的所有其他文件。

我想我可以使用 awk 或 sed 轻松实现此目的,方法是保留包含自上次匹配“编译”以来的所有文本的模式空间,但可能有数千行没有错误。会不会效率非常低?

答案1

关于keeping a pattern space comprising all text since the last match "Compiling", but there can be thousands of lines without an error. Would it be very inefficient?- 它可能不会比任何替代方法效率低,例如在开始打印之前对输入文件进行两次传递以识别匹配的分隔符对,并且它的优点是无论输入是否存储在文件或来自管道。

如果您所在的系统具有以下功能,那么最有效的方法可能就是在中间使用 2 次 with 调用tacawktac

$ tac file |
    awk '/^error:/{f=1; print "---separator---"} f; /^Compiling/{f=0}' |
        tac
Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A
---separator---
Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B
---separator---

否则,只需在每个 Unix 机器上的任何 shell 中使用任何 awk:

$ awk '
    /^Compiling/ { buf="" }
    { buf = buf $0 "\n" }
    /^error:/ { print buf "---separator---" }
' file
Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A
---separator---
Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B
---separator---

或者,使用 GNU awk 进行多字符 RS 和 RT:

$ awk -v RS='\nerror:[^\n]+' -v ORS='\n---separator---\n' '
    sub(/(^|.*\n)Compiling/,"Compiling") { print $0 RT }
' file
Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A
---separator---
Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B
---separator---

答案2

使用perl它非常简单,因为它有一个段落模式-00

perl -00 -ne 'print if /\nerror:/' file

输出:

Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A

Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B

如果添加| sed 's/^$/----separator----/',您还可以根据需要添加自己的分隔符而不是空行。

答案3

使用 Raku(以前称为 Perl_6)

raku -e 'my @array; for slurp.split("\n\n") {@array.push($_)}; for @array {.put if /^Compiling .* \n error/};' 

输入示例:

Compiling File1
... commands ...

Compiling File2
... commands ...

Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A

Compiling File4
... commands ...

Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B

输出示例 (1):

Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A
Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B

简而言之,“ Compiling...”部分在分隔符上被打破\n\n,每个元素都被推送到@array(通过$_“topic”变量)。仅当结果元素以... 开头并且从最后一行开始具有 ...@array时才会打印。Compilingerror

目前尚不清楚为什么OP要求一行---separator---(因为起始行和结束行都已明确指定),但是很容易添加:

raku -e 'my @array; for slurp.split("\n\n") {@array.push($_)}; for @array {put($_,"\n---separator---") if /^Compiling .* \n error/};'

示例输出 (2):

Compiling File3
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find A
---separator---
Compiling File5
... commands ...
In file included from ...
In file included from ...
In file included from ...
error: could not find B
---separator---

附录:OP 在评论中提到内存效率是关键。在 Raku 中,lines例程是惰性的,因此这是一种粗略的方法(目前每个“编译...错误”块在一行上返回):

raku -e 'for lines.split( "Compiling ") {say "ERROR Compiling "~$_ if m/error/};'

或者

raku -e 'say "ERROR Compiling $_" if m/error/ for lines.split( "Compiling ");' 

输出示例 (3):

ERROR Compiling File3 ... commands ... In file included from ... In file included from ... In file included from ... error: could not find A  
ERROR Compiling File5 ... commands ... In file included from ... In file included from ... In file included from ... error: could not find B

https://speakerdeck.com/util/reading-files-cant-be-this-simple
https://raku.org

相关内容