我想生成一个包含name
、foo
和列的 CSV 文件bar
。它可能看起来像这样:
name,foo,bar
a.txt,yes,no
b.txt,no,yes
c.txt,no,no
当我迭代包含文本文件的目录并解释其内容时,将创建 CSV 文件。
a.txt内容如下:
foo:yes
bar:no
baz:?
b.txt内容如下:
foo:no
bar:yes
c.txt内容如下:
foo
bar:no
baz:yes
不应该有baz
专栏。只是指定的foo
和bar
.键值对也可能丢失或不完整(如 c.txt 中)。那么应该有no
as值。
我确信使用awk
or是可能的sed
,但不知道如何实现它。就像是:
find . -name "*.txt" -print0 | xargs -0 -I {} sh -c "awk '...' {}"
答案1
如果您想要打印每个目标键的列,即使这些目标键中的一个或多个不存在于任何输入文件中:
$ cat tst.awk
BEGIN {
numKeys = split("foo bar", tmp)
for (i in tmp) {
keys[i] = tmp[i]
}
FS=":"; OFS=","
}
{ fnameKey2val[FILENAME,$1] = $2 }
END {
printf "%s%s", "name", OFS
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "%s%s", key, (keyNr<numKeys ? OFS : ORS)
}
for (fileNr=1; fileNr<ARGC; fileNr++) {
fname = ARGV[fileNr]
printf "%s%s", fname, OFS
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
val = (fnameKey2val[fname,key] == "" ? "no" : fnameKey2val[fname,key])
printf "%s%s", val, (keyNr<numKeys ? OFS : ORS)
}
}
}
或者如果您不想打印某个键的列(如果所有文件中都缺少该键):
$ cat tst.awk
BEGIN {
split("foo bar", tmp)
for (i in tmp) {
targets[tmp[i]]
}
FS=":"; OFS=","
}
!($1 in targets) { next }
!seen[$1]++ { keys[++numKeys] = $1 }
{ fnameKey2val[FILENAME,$1] = $2 }
END {
printf "%s%s", "name", OFS
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "%s%s", key, (keyNr<numKeys ? OFS : ORS)
}
for (fileNr=1; fileNr<ARGC; fileNr++) {
fname = ARGV[fileNr]
printf "%s%s", fname, OFS
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
val = (fnameKey2val[fname,key] == "" ? "no" : fnameKey2val[fname,key])
printf "%s%s", val, (keyNr<numKeys ? OFS : ORS)
}
}
}
它们都会从给定的样本输入中产生相同的输出:
$ awk -f tst.awk *.txt
name,foo,bar
a.txt,yes,no
b.txt,no,yes
c.txt,no,no
答案2
另一种awk
方法是:
gawk -F':' -v OFS=',' '
BEGIN{ print "name", "foo", "bar"; };
$2== "" { $2="no"; };
$1== "foo" { hold[FILENAME][1]= $2; };
$1== "bar" { hold[FILENAME][2]= $2; };
END{ for (x in hold) print x, hold[x][1], hold[x][2]; }
' [abc].txt
输出:
name,foo,bar
c.txt,no,no
a.txt,yes,no
b.txt,no,yes
答案3
珀尔方法:
perl -lne '
BEGIN {
@A = qw(foo bar);
@{$h{$_}}{@A} = qw(no) x @A for @ARGV;
}
/^(foo|bar)(:.*|$)/ and
($h{$ARGV}{$1} = length($2)<2?"no":$2) =~ s/^://;
}{$,=",";
print q(name), @A;
print $_, @{$h{$_}}{@A} for sort keys %h;
' -- *.txt
name,foo,bar
a.txt,yes,no
b.txt,no,yes
c.txt,no,no
另一种方法使用 grep-sort-sed 管道并假设文件中存在 foo/bar
echo 'name,foo, bar'
grep -HE '^(foo|bar)(:|$)' *.txt |
sed -e '/:.*:/!s/$/:no/' |
sort -t: -k1,1 -k2,2r |
sed -Ee 'N;s/\n[^:]+:/:/;y/:/,/' |
cut -d, -f1,3,5