我有这样的台词 -
/mnt/internal-app/logs/internal-app.log_2019-08-21.log.gz:2019-08-21 07:31:14,153 5458142 [XNIO-3 task-4] INFO c.c.p.i.m.ws.FileManger [FileName.java:1838] - UUIDs in this bucket 8501792126581991569,8073766106536916628,4830289023695906800,6135982080116553120,8306484440313978157,9040948912536460872,8471856544054164043,5431263453539111247,7661719762428556576
/mnt/internal-app/logs/internal-app.log_2019-08-21.log.gz:2019-08-21 07:31:14,153 5458144 [XNIO-3 task-4] INFO c.c.p.i.m.ws.FileManger [FileName.java:1838] - UUIDs in this bucket 6501792126581991569,8073766106536916628,4830289023695906800,6135982080116553120,8306484440313978157,9040948912536460872,8471856544054164043,5431263453539111247,7661719762428556576
我最终需要做的是,收集所有 UUID 并准备一个 SQL 插入语句,如下所示 -
insert into sometable (uuid) values ("6501792126581991569","8073766106536916628")..(..);
这样的行数量巨大,接近 500k。我无法通过在 Sublime 文本编辑器中打开文件来应用正则表达式。
所以,我正在通过 grep 尝试。
我尝试过这个 -
zgrep "UUIDs in this bucket" /mnt/internal-app/logs/internal-app.log_2019-08-2* | grep -Eo ".* UUIDs in this bucket(.*)" | cut -d: -f5
它打印的内容超出了我的需要 -
1838] - UUIDs in this bucket 8501792126581991569,8073766106536916628,4830289023695906800,6135982080116553120,8306484440313978157,9040948912536460872,8471856544054164043,5431263453539111247,7661719762428556576
如何仅从 UUID 中进行选择?
更新
更正了 sql 查询语法 -
insert into sometable (uuid) values ("6501792126581991569"),("8073766106536916628")..(..);
答案1
如果您想要 后面的所有数字UUIDs in this bucket
,可以sed
像这样使用:
$ zcat file.gz | sed -n 's/^.*UUIDs in this bucket //p'
8501792126581991569,8073766106536916628,4830289023695906800,6135982080116553120,8306484440313978157,9040948912536460872,8471856544054164043,5431263453539111247,7661719762428556576
6501792126581991569,8073766106536916628,4830289023695906800,6135982080116553120,8306484440313978157,9040948912536460872,8471856544054164043,5431263453539111247,7661719762428556576
或者,使用 perl 并输出完整的 SQL 语句:
$ zcat file.gz | perl -ne 'chomp;if(s/^.*UUIDs in this bucket //){@uuids=split(/,/); $k{$_}++ for @uuids} END{ print "insert into sometable (uuid) values (" , join ",",map{qq/"$_"/} keys(%k); print ");\n"}'
insert into sometable (uuid) values ("6135982080116553120","4830289023695906800","8501792126581991569","9040948912536460872","7661719762428556576","8471856544054164043","8306484440313978157","6501792126581991569","5431263453539111247","8073766106536916628");
或者,稍微更清晰一些:
$ zcat file.gz |
perl -ne 'chomp;
if(s/^.*UUIDs in this bucket //){
@uuids=split(/,/);
$k{$_}++ for @uuids
}
END{
print "insert into sometable (uuid) values (" ,
join ",",map{qq/"$_"/} @uuids;
print ");\n"
}'
insert into sometable (uuid) values ("6501792126581991569","8073766106536916628","4830289023695906800","6135982080116553120","8306484440313978157","9040948912536460872","8471856544054164043","5431263453539111247","7661719762428556576");
答案2
如果您愿意/能够使用其他工具,那么grep
您可以相当轻松地完成它,awk
因为看起来您总是想要该行的末尾。您可以让它只打印最后一个字段,例如:
zcat /mnt/internal-app/logs/internal-app.log_2019-08-2* | awk '/UUIDs in this bucket/ {print $NF}'
我不知道是否有任何形式zgrep
不支持 Perl 风格的正则表达式,但假设你的也支持,你可以这样做
zgrep -Po 'UUIDs in this bucket \K.*' /mnt/internal-app/logs/internal-app.log_2019-08-2*
因为\K
告诉模式不要将其之前的所有内容都算作匹配的一部分。所以这只会打印比赛后发生的事情。
答案3
另一种perl
生成 SQL 代码的方法:
zcat -f /mnt/internal-app/logs/internal-app.log_2019-08-2* |
perl -lne 'BEGIN{$"=q(",")}
@u = m{(?:UUIDs in this bucket |\G,)\K\d+}g;
print qq(insert into sometable (uuid) values ("@u");) if @u'