捕获由开始和结束模式定义的多行区域

2024-6-12 • tag-icon

捕获由开始和结束模式定义的多行区域

我想从文件中打印中间部分（在开始图案和结束图案之间），并对特定的行进行着色。

这是一个此类文件中的示例文本

## Beginning of file

Some text and code

## FAML [ASMB] KEYWORD
##  Some information.
##  Some other text.
##  Blu:
##  Some text in blue.
## END OF FAML [ASMB]

## Other text

More text and code

## FAML [ASMB] KEYWORD和之间的文本## END OF FAML [ASMB]将被提取（没有开头##）并传递给函数luciferin，该函数将适当地打印多行文本。

块之间的文本将被丢弃。后续块的工作方式相同，通过调用函数提取并打印中间区域luciferin(rec)。该函数luciferin以颜色输出。

输入字符串luciferin为

Some information.
Some other text.
Blu:
Some text in blue.

这是捕获中间区域的 awk 脚本

BEGIN {
  beg_ere = "## [[:alnum:]]+ [[][[:alnum:]]+[]]"
  end_ere = "## END OF [[:alnum:]]+ [[][[:alnum:]]+[]]"
 }

match($0, beg_ere, paggr) { display = 1 }
$0 ~ end_ere { display = 0 ; next }
display { print }

这是一个luciferin接受字符串以颜色输出的函数。其中，cpt在颜色转义序列中，和是多行输入字符串的astr[i]特定行。i

function luciferin(mstr) {
  cpt = tseq["Grn:"]
  nlines = split(mstr, astr, "\n")
  for (i = 1; i <= nlines; i++) {
    for ( knam in tseq ) {
      if ( knam == astr[i] ) { cpt = tseq[knam] ; break }
     }
    if (knam == str) { print "" } else { print cpt astr[i] rst }
   }

 }

答案1

由于既没有最小的完整代码示例，也没有足够的示例输入/输出来测试，这显然只是一个未经测试的猜测，但看起来您应该更改：

display { print }

到

display { rec = rec $0 ORS }

和

$0 ~ end_ere { display = 0 ; next }

到

$0 ~ end_ere { luciferin(rec); rec = ""; display = 0 ; next }

或类似并调整luciferin以在打印之前从其 arg 中删除附加的尾随换行符。

关于如何改进这个问题和 OP 问题 - 这是一个完整的、最小的代码示例在这样的问题中的样子：

$ cat tst.awk
$2 == "FAML" { display = 1 ; next }
$2 == "END" { display = 0 ; next }
display { print }

function luciferin(mstr) {
    nlines = split(mstr, astr, "\n")
    for (i = 1; i <= nlines; i++) {
        print "Luci:", astr[i]
    }
}

以及一些示例输入来展示您的需求并进行测试：

$ cat input
## Beginning of file

Some text and code

## FAML [ASMB] KEYWORD
##  Some information.
##  Some other text.
## END OF FAML [ASMB]

## Other text

## FAML [ASMB] KEYWORD
##  Some other information.
##  Even more text.
## END OF FAML [ASMB]

More text and code

以及给定输入的预期输出：

Luci: ##  Some information.
Luci: ##  Some other text.
Luci: ##  Some other information.
Luci: ##  Even more text.

事实上，您的真实代码执行着色或其他任何操作与您需要帮助的问题完全无关，这只是如何存储文本块并调用luciferin()以某种方式修改它来打印它。

给定一个清晰、简单的例子，我们可以很快地向您展示一个解决方案，例如：

$ cat tst.awk
$2 == "FAML" { display = 1 ; next }
$2 == "END" { luciferin(rec); rec = ""; display = 0 ; next }
display { rec = rec $0 ORS }

function luciferin(mstr) {
    nlines = split(mstr, astr, "\n")
    for (i = 1; i < nlines; i++) {
        print "Luci:", astr[i]
    }
}

$ awk -f tst.awk input
Luci: ##  Some information.
Luci: ##  Some other text.
Luci: ##  Some other information.
Luci: ##  Even more text.

然后您可以将其中的概念应用到您的实际代码中。

答案2

解决这个问题awk当然是可行的，但你似乎让自己太难了。 Perl直接为此类范围提供语言支持，从sed评论中提到的功能复制而来。

让我们把春天涂成蓝色。

$ cat months.txt | perl -ane 'print "blue" if /Mar/../May/; print "\t$_"'
        January
        February
blue    March
blue    April
blue    May
        June

在这些正则表达式中使用 FAML / ASMB 关键字，使其适应您的用例。

即使您希望进行比这更高级的处理，它仍然是您管道中的一个很好的初始阶段。

现在后续阶段不必担心行范围；它可以使用第一个字段来识别我们是否在范围内，然后相应地处理该行的其余部分。

相关内容