我想创建一个 bash 脚本,它一次一行地迭代一个文件,并产生一致的输出:
示例.txt
ALBERT some a BRYAN some b CLAUDIA some c DAVID some d ERIK some e
ALBERT some a BRYAN some b ERIK some e
ALBERT some a BRYAN some b DAVID some d
一些注意事项:
- 标签之间的字数不同
- 关键词总是以相同的顺序出现
- 提供完整的关键字列表,并且可以在动手之前指定
所需输出:
some a; some b; some c; some d; some e
some a; some b;;; some e
some a; some b;; some d;
我可以使用 sed 轻松地用分号一一替换关键字:
sed -i 's/ALBERT/;/g' "example.txt"
如果缺少某些关键字,如何使用 awk 迭代每一行并添加所需的分号?我怀疑必须引入某种计数器?
答案1
假设某些标签(如“ALBERT”之类的名称)可能从第一行中丢失,就像它们可能从其他行中丢失一样,您需要使用 2 遍方法来首先识别所有标签,然后打印所有标签的值它们适用于每一行,无论它们是否出现在该行上。
$ cat tst.awk
BEGIN { OFS=";" }
NR==FNR {
for (i=1; i<NF; i+=3 ) {
if ( !seen[$i]++ ) {
tags[++numTags] = $i
}
}
next
}
{
delete tag2val
for (i=1; i<NF; i+=3) {
tag = $i
val = $(i+1) FS $(i+2)
tag2val[tag] = val
}
for (tagNr=1; tagNr<=numTags; tagNr++) {
tag = tags[tagNr]
val = tag2val[tag]
printf "%s%s", val, (tagNr<numTags ? OFS : ORS)
}
}
$ awk -f tst.awk example.txt example.txt | column -t -s';' -o'; '
some a; some b; some c; some d; some e
some a; some b; ; ; some e
some a; some b; ; some d;
上面的代码将为每行输出所有标签的值,按照它们在所有输入中出现的顺序排列。
如果您想将标签视为列标题:
$ cat tst.awk
BEGIN { OFS=";" }
NR==FNR {
for (i=1; i<NF; i+=3 ) {
if ( !seen[$i]++ ) {
tags[++numTags] = $i
}
}
next
}
FNR==1 {
for (tagNr=1; tagNr<=numTags; tagNr++) {
tag = tags[tagNr]
printf "%s%s", tag, (tagNr<numTags ? OFS : ORS)
}
}
{
delete tag2val
for (i=1; i<NF; i+=3) {
tag = $i
val = $(i+1) FS $(i+2)
tag2val[tag] = val
}
for (tagNr=1; tagNr<=numTags; tagNr++) {
tag = tags[tagNr]
val = tag2val[tag]
printf "%s%s", val, (tagNr<numTags ? OFS : ORS)
}
}
$ awk -f tst.awk example.txt example.txt | column -t -s';' -o'; '
ALBERT; BRYAN ; CLAUDIA; DAVID ; ERIK
some a; some b; some c ; some d; some e
some a; some b; ; ; some e
some a; some b; ; some d;
答案2
使用perl:
#!/usr/bin/perl
# @keys is an array containing the keywords. It also determines
# the field output order. This can be read from a file if needed,
# but here it's hard-coded.
my @keys = qw(ALBERT BRYAN CLAUDIA DAVID ERIK);
# create and pre-compile a regex matching all the keywords
my $keys = join("|",@keys);
my $keys_re = qr/$keys/;
# make an empty hash containing elements for all the keys so that
# we can start processing each input record afresh, with a fully
# populated list of keys.
my %empty = map +( $_ => '' ), @keys;
# main loop, process stdin and/or filename args
while(<>) {
# clean up the input a little.
chomp; # trim newlines at EOL
s/^\s*|\s*$//g; # trim leading and trailing whitespace
# ignore empty lines.
next if (m/^$/);
# NUL can't be in text input, so insert it as a marker around
# the keywords. i.e. insert NULs before and after each keyword
s/$keys_re/\000$&\000/g;
# split the input record on NUL, trimming spaces and discarding
# the first element (a bogus artificial field which only exists
# as a side-effect of inserting a NUL before the first keyword.)
my (undef,@record) = split /\s*\000\s*/;
# pre-populate the fields hash for each record.
my %fields = %empty;
# now insert the real values for each keyword if they exist.
foreach my $i (0..$#record) {
$fields{$record[$i]} = $record[$i+1];
$i++;
};
print join(";", map +( $fields{$_} ), @keys),"\n";
}
如果您希望每个分号后面有一个空格,请更改print join(";",...)
上面的行并添加一个。
要从文件中读取关键字,请将my @keys = qw(...)
上面的行替换为:
# slurp in the keywords file and split it on any whitespace.
my @keys = split /\s+/, do {
local $/; # read entire file at once - slurp
my $fname = 'keywords.txt';
open(my $fh, '<', $fname) or die "Error opening $fname: $!";
<$fh>
};
keywords.txt 可以包含由垂直或水平空白的任意组合分隔的键 - 空格、制表符、换行符和 CR/LF 等,例如
$ cat keywords.txt
ALBERT
BRYAN CLAUDIA
DAVID ERIK
将其另存为,例如iterate.pl
,并使其可执行chmod +x iterate.pl
。
$ ./iterate.pl input.txt
some a;some b;some c;some d;some e
some a;some b;;;some e
some a;some b;;some d;
如果您想要更漂亮的输出以便在更少或其他内容中查看,您可以使用column
,例如
$ ./iterate.pl input.txt | column -s';' -o'; ' -t
some a; some b; some c; some d; some e
some a; some b; ; ; some e
some a; some b; ; some d;