早上好,
我有许多数据块,其中包含基于用户输入的 1 到 8 个变量(下面表示为“CONDx”)。我已经使用 awk 和 grep 编写了一个脚本来提取要以列格式呈现的数据。这些数据是我从一个更大的文件中提取的,所以也许我需要后退一步来解决我的解决方案。无论如何,数据看起来像这样:
> cat file
foo
REF Data1
COND1 Value1
COND2 Value2
foo
REF Data2
COND3 Value3
foo
REF Data3
COND1 Value4
COND3 Value5
foo
我的脚本以下列格式显示结果,我需要在垂直方向手动修改以使其正确排列:
COND1 COND2 COND3 COND4 COND5 COND6 COND7 COND8
Data1 Value1 Value2 Value3 x x x x x
Data2 Value4 Value5
Data3
我的问题是,是否可以使用 awk (或 sed,等等)来检查每个 CONDx 是否包含在每个 REF 块中,如果它打印相应的“ValueX”,如果不是,则打印一个“x”(或更好的是空白)作为占位符?所以期望的输出是:
COND1 COND2 COND3 COND4 COND5 COND6 COND7 COND8
Data1 Value1 Value2 x x x x x x
Data2 x x Value3 x x x x x
Data3 Value3 x Value5 x x x x x
以 COND1 为例,部分脚本包含:
grep COND1 file | awk '{print $2} END { if (!NR) print "x" }' > temp.cond1
temp.cond1 粘贴到结果文件中,但这只在第一行打印一个“x”,如我的输出所示,我明白为什么它不起作用,但想不出新的方法。我想也许可以用 IF 语句来做?将不胜感激任何帮助。
谢谢你的时间。
答案1
这是 awk 中的一个实现。我已经有一段时间没有使用该语言编写超过几行程序了,并且认为这将是一个有趣的练习。
要使用程序运行 awk,您需要指定-f
标志,例如:
awk -f my_program.awk my_data.txt
此实现仅输出在文件中找到的 CONDx 变量。
# Initialize a couple of variables
BEGIN {
fill_value = "xx"
record_number = 0
}
# for any line that begins and ends with `foo` save the record
# and then move on to process the next line
/^foo$/ { save_record(); next }
# for any other line, grab the key and data, and mark that the record is valid
{
fields[$1] = $1
record[$1] = $2;
record[1] = "exists"
}
# after reading in all of the records, output them
END {
# sort the fields into alpha order
asort(fields)
delete fields["REF"]
printf("%-8s", "REF")
for (field in fields) {
printf("%-8s", fields[field])
}
print ""
# print the records
for (i=0; i < record_number; i++) {
record_name = record_number_str(i, "REF");
printf("%-8s", records[record_name])
for (field in fields) {
record_name = record_number_str(i, fields[field])
to_print = fill_value
if (record_name in records)
to_print = records[record_name]
printf("%-8s", to_print)
}
print ""
}
}
function save_record() {
if (1 in record) {
delete record[1]
for (rec in record)
records[record_number_str(record_number, rec)] = record[rec]
record_number++
}
delete record
}
# awk only has single dimensional associative arrays. So we need
# to construct a key for the array that has two dimensions
function record_number_str(record_number, rec) {
return sprintf("%06d %s", record_number, rec)
}
我认为 awk 不是最理想的语言。更好的可能是:perl、ruby 或 python。作为对比,这里是 python 实现。请注意,行数大约只有 1/2:
import fileinput
record = {}
records = []
fields = set()
for line in [l.strip() for l in fileinput.input()]:
if line == 'foo':
if record:
records.append(record)
record = {}
else:
key, value = line.split()
record[key] = value
fields.add(key)
# print the header
print("%-8s" % "REF", end="")
fields.remove("REF")
for field in sorted(fields):
print("%-8s" % field, end="")
print()
# print the records
for record in records:
print("%-8s" % record["REF"], end="")
for field in sorted(fields):
print("%-8s" % record.get(field, ''), end="")
print()