我在 CentOS 上。我有一个要读取的文件列表,从中提取数据并将其组织为 csv 文件。
日志文件文本格式为:
...
{"name":"test-api","hostname":"ci47","pid":3202,"level":30,"msg":"File: dsiManager, Method: getContract, End { userId: 'AFC5EH5PIHHLO4XS7SG',\n clientId: '5003700557',\n intent: 'YesIntent',\n }","time":"2019-01-21T12:23:10.323Z","v":0}
...
输出格式必须是:
clientId;intent;time;userId
5003700557;YesIntent;2019-01-21T12:23:10.323Z;AFC5EH5PIHHLO4XS7SG
完成这项任务的最简单方法是什么? (awk、grep...)
答案1
要可靠地解析 JSON 编码的数据,您将需要一个 JSON 编解码器。这几乎意味着 Perl 或 Python(或 Ruby ...)。由于我是 Perl 人员,因此这里有一个 Perl 解决方案。
首先来说一句:
$ perl -MJSON -ne 'BEGIN { print("clientId;intent;time;userId\n"); } eval { my $obj = from_json($_); my $msg = $obj->{msg}; $msg =~ s/^.*{\s*|\s*,\s*}.*$//g; my %m = map { m/^([^:]*):\s*(.*)/; ($1, $2) } split(/,\s+/, $msg); print("$m{clientId};$m{intent};$obj->{time};$m{userId}\n"); }; warn($@) if ($@);' <x
clientId;intent;time;userId
5003700557;YesIntent;2019-01-21T12:23:10.323Z;AFC5EH5PIHHLO4XS7SG
因为这有点过分,即使对于 Perl 来说,这里还有一个可读的脚本:
#!/usr/bin/perl
use strict;
use warnings;
use JSON;
print("clientId;intent;time;userId\n");
while (<>) {
# Don't choke on malformed lines
eval {
my $obj = from_json($_);
my $msg = $obj->{msg};
$msg =~
s/^.*{\s* # Trim up to and including the leading '{'
|
\s*,\s*}.*$ # Trim trailing ',}'
//gx;
# Split $msg into key-value pairs
my %m = map {
m/^([^:]*) # Stuff that isn't ':'
:\s* # Field separator
(.*) # Everything after the separator
/x;
($1, $2)
} split(/,\s+/, $msg);
print("$m{clientId};$m{intent};$obj->{time};$m{userId}\n");
};
warn($@) if ($@);
}
答案2
尝试这个,
awk -F "['\"]" 'NF>=26{print $19","$21","$26","$17}' file.csv
5003700557,YesIntent,2019-01-21T12:23:10.323Z,AFC5EH5PIHHLO4XS7SG
['\"]
同时使用单引号和双引号作为分隔符。NF>=26
只是为了检查该行是否有大于或等于 26 个字段。
答案3
我用的是awk命令。我的问题是每一行都与其他行不同。所以我不知道列号;我通过添加测试来查找要显示的正确行号来解决此问题。这是我的代码:
awk '
BEGIN {
# Set awk script delimiter
FS=",";
# Set CSV file separator
OFS=";";
# Set header part in csv file
print "Method; UserId; ClientId; intent; time"
}
/'clientId'/
{
i=1;
msg="";
while(i<=NF) {
if ($i ~ /clientId/) {
# Cleaning column value :
gsub(/\\n\s{1,}clientId:\s/, "",$i);
msg = msg $i ";"
};
if ($i ~ /"time"/) {
# Cleaning column value :
gsub(/"time":/, "",$i);
msg = msg $i ";"
};
if ($i ~ /intent/) {
# Cleaning column value :
gsub(/\\n\s{1,}intent:\s{1,}/, "",$i);
msg = msg $i ";"
};
if ($i ~ /Method/) {
# Cleaning column value :
gsub(/(^(.*?)|\s{1,})Method\s{1,}?:?\s{1,}/, "",$i);
gsub (/(\s{1,}\{\s{1,}userId.*)?/, "", $i);
msg = msg $i ";"
};
if ($i ~ /userId/) {
# Cleaning column value :
gsub(/(^(.*?)|\s{1,})userId:\s/, "",$i);
msg = msg $i ";"
};
i++
} print msg
}
END {
print NR
} '
$(grep -l id *.log) >> output.csv
- 我使用 gsub() 方法来清理某些列值,因为我有脏的旧日志文件
- $(grep -l id *.log) 命令用于列出所有 awk 输入日志文件