awk 去重复大部分取反

awk 去重复大部分取反

不要重复自己!,哈哈,好吧,我想看看是否可以将我的 2 个 github 工作流程 awk 源合并为 1 个。我尝试了一些东西,但由于某种原因,我总是以奇怪的专栏内容结束。回想起来,我可能一直在分配而不是比较变量。我决定不用损坏的代码来污染这个问题。

awk 正在运行git diff-tree -r --no-commit-id --name-status HEAD | awk -f .github/files/changed.awk,没有选项。的输出git diff-tree看起来像这样

( cat << EOF
A       .config/plantuml/theme.puml
M       .github/workflows/main.yml
M       .github/workflows/plantuml.yml
M       README.md
A       app/gradle.lockfile
A       authn/gradle.lockfile
A       docs/README.md
A       docs/domain-model/README.md
A       docs/domain-model/user.md
A       docs/domain-model/user.puml
A       docs/domain-model/foo.puml
M       settings.gradle.kts
D       user.puml
D       foo.puml
EOF
) | awk -f file.awk

这些是我想出的。

$2 ~ /\.puml$/ &&
$2 !~ /(theme|config)\.puml$/ &&
$1 !~ /^D$/ {
  changed[++n]=$2
}
END {
  for ( i in changed ) result = result " " changed[i]
  printf "::set-output name=changed-files::%s\n", result
  printf "::warning::changed-files %s\n", result
}

输出

::set-output name=changed-files:: docs/domain-model/user.puml docs/domain-model/foo.puml
::warning::changed-files docs/domain-model/user.puml docs/domain-model/foo.puml
$2 ~ /\.puml$/ &&
$2 !~ /(theme|config)\.puml$/ &&
$1 ~ /^D$/ {
  split($2, fn, ".")
  changed[++n]=fn[1] ".svg"
}
END {
  for ( i in changed ) result = result " " changed[i]
  printf "::set-output name=removed-files::%s\n", result
  printf "::warning::removed-files %s\n", result
}

输出

::set-output name=removed-files:: user.svg foo.svg
::warning::removed-files  user.svg foo.svg

这是当前工作流程文件如果有帮助的话。注意:更改awk -f是在当前损坏的分支中进行的,完成后我将删除并压缩该分支。

目标是获得输出相似的到以下。我不在乎关于行的顺序,列表中的空格仅在它可以传递到shell 上的git add,git rm和时才重要。plantuml

# adding empty lines between just for readability here, comments do nott matter either
::set-output name=changed-files:: docs/domain-model/user.puml docs/domain-model/foo.puml
::warning::changed-files docs/domain-model/user.puml docs/domain-model/foo.puml

::set-output name=removed-files:: user.svg foo.svg
::warning::removed-files  user.svg foo.svg

您将如何消除重复并改进它?

答案1

经过一些改进,并将两个部分重复删除为一个部分。

$2 ~ /\.puml$/ && $2 !~ /(theme|config)\.puml$/ \
{ if($1 !~ /^D$/) { result1= (result1==""? "" : result1 " ") $2 } else {
      split($2, fn, ".")
      result2=(result2==""? "" : result2 " ") fn[1] ".svg"
  }
}
END {
  printf "::set-output name=files::%s\n::warning::%s\n", result1, result1
  printf "::set-output name=files::%s\n::warning::%s\n", result2, result2
}

答案2

所以你有了输入(或文件,git的结果)

A       .config/plantuml/theme.puml
M       .github/workflows/main.yml
M       .github/workflows/plantuml.yml
M       README.md
A       app/gradle.lockfile
A       authn/gradle.lockfile
A       docs/README.md
A       docs/domain-model/README.md
A       docs/domain-model/user.md
A       docs/domain-model/user.puml
A       docs/domain-model/foo.puml
M       settings.gradle.kts
D       user.puml
D       foo.puml

awk 脚本对此输入执行类似(但不同)的操作。

我假设您的目标是(尽可能)减少此脚本的操作。

前两行完全相等:

$2 ~ /\.puml$/ &&
$2 !~ /(theme|config)\.puml$/

之后,一个脚本会采取$2行动不是 D(我认为这意味着删除)。另一个脚本对补集采取行动,$2D。可以编码为:

{ if ( $1 ~ /^[D]$/ ) then { print "Deleted" } else { print "Changed" } }

或者,如果您想要更精细的选择:

{ if ( $1 ~ /^[D]$/  ) then { print "Deleted" } 
  if ( $1 ~ /^[MA]$/ ) then { print "Changed" } }

实际上并不需要将每个文件存储在一个数组中,因为您需要的是一个空格分隔的文件列表。

这可以在每个输入行的循环内完成(内存更少,速度更快):

{ if ( $1 ~ /^[D]$/ ) then { deleted = deleted " " $2 } 
  if ( $1 ~ /^[A]$/ ) then { changed = changed " " $2 } }

当然,匹配可以是精确的字符串而不是正则表达式(更快),并且您还需要在更改后的情况下提取不带扩展名的文件名:

{ if ( $1 == "D" ) then { deleted = deleted " " $2 } 
  if ( $1 == "A" ) then { sub( /\.[^.]+$/, "", $2 ) ; 
                          changed = changed " " $2 ".svg"
                        }
}

测试所有想法的 shell 脚本可能是:

#!/bin/bash

a='\
A       .config/plantuml/theme.puml
M       .github/workflows/main.yml
M       .github/workflows/plantuml.yml
M       README.md
A       app/gradle.lockfile
A       authn/gradle.lockfile
A       docs/README.md
A       docs/domain-model/README.md
A       docs/domain-model/user.md
A       docs/domain-model/user.puml
A       docs/domain-model/foo.puml
M       settings.gradle.kts
D       user.puml
D       foo.puml
'

printf '%s\n' "$a" | awk '
      $2  ~ /\.puml$/ &&
      $2 !~ /(theme|config)\.puml$/ {  
             if ( $1 == "D" ) { deleted = deleted " " $2 }
             if ( $1 == "A" ) { sub(/\.[^.]+$/, "", $2);
                                changed = changed " " $2 ".svg"
             }
      }
      END {
             printf "::set-output name=removed-files::%s\n", deleted
             printf "::warning::removed-files %s\n", deleted
             printf "::set-output name=changed-files::%s\n", changed
             printf "::warning::changed-files %s\n", changed
      }
    '

相关内容