我正在尝试屏蔽日志文件中的一些敏感数据。
我首先需要使用匹配的模式从文件中过滤出特定行,然后对于这些特定行,我需要替换双引号内的任何文本,但保留任何不在双引号内的文本。
在文件中,与模式匹配的所有行(包含双引号)、双引号内的任何内容都需要以任何 AZ 替换为 X、任何 az 替换为 x、任何数字 0-9 替换为 0 的方式进行替换。
一行中可以有多个带引号的字符串。内部引号也可以是特殊字符,例如“,”、“-”、“.”、“@”,这些字符应按原样保留。
示例文件内容(本例中的过滤词是“KEYWORD”):
2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Replace This"}}} -> {:entry1 {:entry2 {:value "Replace ALSO this."}}}
2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "REplace. THIS 12345"}}}
2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}}
2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Do NOT replace this."}}} -> {:entry1 {:entry2 {:value "Do-NoT replace this either"}}}
作为输入的该文件将被处理为以下输出:
2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Xxxxxxx Xxxx"}}} -> {:entry1 {:entry2 {:value "Xxxxxxx XXXX xxxx."}}}
2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "XXxxxxx. XXXX 00000"}}}
2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}}
2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Do NOT replace this."}}} -> {:entry1 {:entry2 {:value "Do-NoT replace this either"}}}
需要在文件中更新更改的行,或者应将经过这些修改的整个文件扔到标准输出中(还有那些没有关键字、行顺序等的行。应保留详细信息。
是否可以使用 bash 脚本/命令行工具(如 grep 和/或 sed)来完成此任务?
答案1
awk '/KEYWORD/{
n=split($0,a,"\"")
for(i=2;i<=n;i=i+2){
gsub(/[A-Z]/,"X",a[i])
gsub(/[a-z]/,"x",a[i])
gsub(/[0-9]/,"0",a[i])
}
sep=""
for (i=1;i<=n;i++){
printf "%s%s",sep,a[i]
sep="\""
}
printf "\n"
next
}
1' file
例如,在更新的输入文件上
2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Replace This"}}} -> {:entry1 {:entry2 {:value "Replace ALSO this."}}}
2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "REplace. THIS 12345"}}}
2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}}
2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Do NOT replace this."}}} -> {:entry1 {:entry2 {:value "Do-NoT replace this either"}}}
这个 awk 产生所需的输出
2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Xxxxxxx Xxxx"}}} -> {:entry1 {:entry2 {:value "Xxxxxxx XXXX xxxx."}}}
2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "XXxxxxx. XXXX 00000"}}}
2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}}
2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Do NOT replace this."}}} -> {:entry1 {:entry2 {:value "Do-NoT replace this either"}}}
答案2
使用sed
:
sed -E '/KEYWORD/{
:lower s/("[^"]*)[a-z]([^"]*")/\1_\2/; t lower;
:upper s/("[^"]*)[A-Z]([^"]*")/\1-\2/; t upper;
:digit s/("[^"]*)[0-9]([^"]*")/\1*\2/; t digit;
}; y/*_-/0xX/' infile
/KEYWORD/{...}
仅当一行与字符串匹配时,才会运行块中的代码集KEYWORD
。
这("[^"]*)[###]([^"]*")
与 a 以及此后的任何内容匹配,"
直到找到第一个小写[a-z]
/大写[A-Z]
/数字[0-9]
字符,该字符由任何内容流动,直到另一个引号匹配。
每个部分都会一遍又一遍地循环,直到所有这些字符都被小写转换为_
,大写转换为-
,数字转换为*
(笔记:如果您的文件中可能出现这些字符,请选择不同的字符;原因是我们没有直接替换为x
orX
或 ,0
因为使用后它会导致 sed 无限循环sed 的循环替换每个小/大/数字字符)。
完成后,这些字符*_-
将转换为0xX
.
向上述命令添加-i
选项以更新输入文件中的更改,例如sed -i -E ...
.
更新:修改问题的命令:
sed -E '/KEYWORD/{
:lower s/^(([^"]*("[^"]*"){0,1})*)("[^"]*)[a-z]([^"]*")/\1\4_\5/; t lower;
:upper s/^(([^"]*("[^"]*"){0,1})*)("[^"]*)[A-Z]([^"]*")/\1\4+\5/; t upper;
:digit s/^(([^"]*("[^"]*"){0,1})*)("[^"]*)[0-9]([^"]*")/\1\4*\5/; t digit;
}; y/*_+/0xX/' infile
答案3
使用珀尔:
$ perl -ne 'if ( $_ =~ /KEYWORD/){
($first,$matched,$last) = ($1,$2,$3) if ( $_ =~ /^(.*)?\"(.*)\"(.*)$/ );
$matched =~ tr/[a-z]/x/;$matched =~ tr/[A-Z]/X/;$matched =~ tr/0-9/0/;
print $first."\"".$matched."\"".$last."\n";
}
else { print }' <<inputFile>>
编辑:如果模式出现多次。以下将起作用;
$ perl -ne ' {
if ( $_ =~ /KEYWORD/ ){
$line=$_;$val=1;
while($val) {
($first,$matched,$last) = ($1,$2,$3) if ( $line =~ m/(.*?)\"(.*?)\"(.*)$/ );
$val = $line =~ s/\".*?\"/_/;
$matched =~ tr/[a-z]/x/;$matched =~ tr/[A-Z]/X/;$matched =~ tr/0-9/0/;
$matched = "_".$matched."_";
$line=$first.$matched.$last;
}
$line =~ s/[_]*_/"/g;
print "$line\n";
}else { print } }' <<inputFile>>