递归模式搜索 - 输出格式化：对于每个匹配文件打印出特定的文件名“\n”，行号，颜色匹配“\n”的句子

Question 1

我思考你想要的是这样的：

$ grep -rwn --include=\*.{md,txt} -ie "output.*" --color=always | awk -F: '{if(f!=$1){print "\n"$1;}f=$1; $1=""; }1'

file.txt
 1 2023-09-25  after colon char does not output the sentence.
 2 2023-09-25 outputs line as there is NO colon preceding match.

file1.txt
 1 2023-09-25  after colon char does not output the sentence.
 2 2023-09-25 outputs line as there is NO colon preceding match.

看起来像这样：

您的方法的问题在于您使用:作为字段分隔符，然后仅显式打印字段 2 和 3。因此，当:行中有更多字段时，您会错过其余字段。我在这里所做的是清空第一个字段 ( $1="")，然后打印整行（1;打印该行；在中awk，当某些内容评估为 true 时，并且始终评估为 true 时，默认操作1是打印该行）。

为了清楚起见，您可以将awk代码扩展为：

awk -F: '
 {
   ## If this is a new file name, print the file name
   if ( f != $1 ){
     print "\n"$1
   }
   ## save the 1st field in the variable f
   f=$1
   ## clear the first field
   $1=""
   ## print the line
   print
}'

重要的：如果文件名本身包含:.file:weird.txt例如，您可以有一个名为的文件。处理这个问题是可能的，但需要更多的脚本，因此如果这是一个问题，请更新您的问题以包含更多示例文件名或发布新问题。

Answer

我思考你想要的是这样的：

$ grep -rwn --include=\*.{md,txt} -ie "output.*" --color=always | awk -F: '{if(f!=$1){print "\n"$1;}f=$1; $1=""; }1'

file.txt
 1 2023-09-25  after colon char does not output the sentence.
 2 2023-09-25 outputs line as there is NO colon preceding match.

file1.txt
 1 2023-09-25  after colon char does not output the sentence.
 2 2023-09-25 outputs line as there is NO colon preceding match.

看起来像这样：

您的方法的问题在于您使用:作为字段分隔符，然后仅显式打印字段 2 和 3。因此，当:行中有更多字段时，您会错过其余字段。我在这里所做的是清空第一个字段 ( $1="")，然后打印整行（1;打印该行；在中awk，当某些内容评估为 true 时，并且始终评估为 true 时，默认操作1是打印该行）。

为了清楚起见，您可以将awk代码扩展为：

awk -F: '
 {
   ## If this is a new file name, print the file name
   if ( f != $1 ){
     print "\n"$1
   }
   ## save the 1st field in the variable f
   f=$1
   ## clear the first field
   $1=""
   ## print the line
   print
}'

重要的：如果文件名本身包含:.file:weird.txt例如，您可以有一个名为的文件。处理这个问题是可能的，但需要更多的脚本，因此如果这是一个问题，请更新您的问题以包含更多示例文件名或发布新问题。

Question 2

通过此命令您提供了：

grep -rwn --include=\*.{md,txt} -ie "output.*" --color=always |
    awk -F: '{if(f!=$1)print "\n"$1; f=$1; print $2 ":" $3;}'

我认为您正在尝试在以或output.*结尾的文件中查找包含大写或小写字母匹配的字符串的行。那是：.md.txt

find . -type f \( -name '*.md' -o -name '*.txt' \) -exec \
    grep -Hin 'output' \
{} +

然后，您将该输出通过管道传输到 awk，我再次认为，更改此输出：

file1:lineNr1:text1
file1:lineNr2:text2
file2:lineNr1:text1
file2:lineNr1:text2

对此：

file1
lineNr1:text1
lineNr2:text2

file2
lineNr1:text1
lineNr2:text2

因此，这就是您寻求帮助以实现打印到屏幕的内容：

$ grep -rwn --include=\*.{md,txt} -ie "output.*" --color=always |
    awk -F':' '{p=f; f=$1; sub(/[^:]+:/,"")} f!=p{print sep f; sep=ORS} 1'
test.txt
1:2023-09-25: after colon char does not output the sentence.
2:2023-09-25 outputs line as there is NO colon preceding match.

grep但是，当读取结果时，用于对结果进行着色的 ASCII 转义序列已经存在于输出中，awk因此，如果您想生成 HTML 标签而不是 ASCII 转义序列，则需要更新 awk 脚本以在其输入中查找这些转义序列并将它们转换为 HTML 标签，这有点向后和脆弱（例如，如果原始输入中存在一些转义序列怎么办？将无法区分这些转义序列和 grep 添加的转义序列）与仅运行 awk 而不是 grep在原始输入文件上并让 awk 打印您想要的任何着色字符串。

要以您喜欢的任何布局打印未着色的文本，您不会将 find+grep 的输出通过管道传输到 awk，您可以将 grep 替换为 awk，例如

find . -type f \( -name '*.md' -o -name '*.txt' \) -exec \
    awk '
        tolower($0) ~ /output/ {
            if ( !seen[FILENAME]++ ) {
                print ORS FILENAME
            }
            print
        }
    ' \
{} +

如果您想在输出中使用颜色，请更新 awk 脚本以打印转义序列或 HTML 标记或您喜欢的任何颜色，无论您想要什么文本，请参阅https://unix.stackexchange.com/a/669122/133219和https://stackoverflow.com/questions/64034385/using-awk-to-color-the-output-in-bash/64046525#64046525有关对屏幕上的颜色执行此操作的方法，请参阅https://stackoverflow.com/a/40722767/1745001和https://stackoverflow.com/a/39193330/1745001了解为 HTML 输出着色的方法。

下面是在 bash 脚本中使用 find+awk 来格式化输出的示例，我认为您希望打印到屏幕上：

$ cat tst.sh
#!/usr/bin/env bash
tput sc
trap 'tput rc; exit' EXIT

colors=( reset red green yellow blue purple )
for colorNr in "${!colors[@]}"; do
    fgColorMap+=( "${colors[colorNr]} $(tput setaf $colorNr)" )
done

find . -type f \( -name '*.md' -o -name '*.txt' \) -exec \
    awk -v fgColorMap="${fgColorMap[*]}" '
        BEGIN {
            OFS = ":"
            split(fgColorMap,tmp)
            for ( i=1; i in tmp; i+=2 ) {
                fg[tmp[i]] = tmp[i+1]
            }
        }

        match(tolower($0),/output.*/) {
            if ( !seen[FILENAME]++ ) {
                if ( found++ ) { print "" }
                print fg["purple"] FILENAME fg["reset"]
            }
            print fg["green"] FNR ":" fg["reset"]                  \
                  substr($0,1,RSTART-1)                           \
                  fg["red"] substr($0,RSTART,RLENGTH) fg["reset"] \
                  substr($0,RSTART+RLENGTH)
        }
        END { if ( found ) print "" }
    ' \
{} +

这是可见的文本输出：

$ ./tst.sh
./test.txt
1:2023-09-25: after colon char does not output the sentence.
2:2023-09-25 outputs line as there is NO colon preceding match.

这是相同的，但显示了颜色代码：

$ ./tst.sh | cat -A
^[7^[[35m./test.txt^[[30m$
^[[32m1:^[[30m2023-09-25: after colon char does not ^[[31moutput the sentence.^[[30m$
^[[32m2:^[[30m2023-09-25 ^[[31moutputs line as there is NO colon preceding match.^[[30m$
$
^[8$

这是彩色输出：

要获取 HTML，只需更改 awk 脚本即可打印您想要的任何 HTML。您在问题中没有显示任何预期的 HTML 输出，因此我们无法帮助您获得您想要的内容，因为您没有向我们展示您想要的内容，但是有很多现有示例可供您使用（请参阅参考资料我在上面提供了），因此如果您不知道如何做到这一点，您可以稍后提出新问题。

Answer