我有两个文件
文件1(参考文件)
xxx xxxxx 00
xxx xxxxx 01
xxx xxxxx 02
xxx xxxxx 03
xxx xxxxx 04
xxx xxxxx 00
xxx xxxxx 01
xxx xxxxx 02
xxx xxxxx 03
xxx xxxxx 04
文件2
12345 2021/04/02 00
1212 2021/04/02 01
12123 2021/04/02 02
12123 2021/04/02 04
1223 2021/04/03 01
124 2021/04/03 02
123 2021/04/03 03
我想比较每个文件的最后一个字段并附加第一个文件(我的参考文件)中缺少的行
例如我希望输出是
12345 2021/04/02 00
1212 2021/04/02 01
12123 2021/04/02 02
xxx xxxxx 03
12123 2021/04/02 04
xxx xxxxx 00
1223 2021/04/03 01
124 2021/04/03 02
123 2021/04/03 03
xxx xxxxx 04
我尝试过使用awk -F ' ' 'NR==FNR{a[$2]++;next}a[$2] && $1>=00' test2.txt test1.txt
,它会附加 file1 中缺少的第三个值,但输出也会删除我需要的数据(第二个和第三个字段)。
答案1
也许是这样的?
cat file2 | awk '!(1 in f) {if ((getline l < "-") == 1) split(l, f)} $3!=f[3] {print;next} {print l; delete f}' file1 | column -t
请注意,该脚本期望file1
为争论到 awk,同时期望file2
其标准输入。我使用了“猫的无用使用”来更明确地表明这一点,但自然地,您可以将其作为< file2
重定向提供。事实上,您甚至可以将文件名嵌入到脚本本身中,以代替"file2"
,但这种方式更灵活一些。"-"
getline
另请注意,这两个文件预计将在 field3 值方面开始“同步”,或者如果这对您的用例有意义,则可能会file2
“提前” 。file1
为了便于阅读,脚本单独进行了分解,并详细注释了解释:
# Check if our `real_fields` array is not existent.
# NOTE: we use the `<index> in <array>` construct
# in order to force awk treat `real_fields` name as an
# array (instead of as a scalar as it would by default)
# and build it in an empty state
!(1 in real_fields) {
# get the next line (if any) from the "real" file
if ((getline real_line < "-") == 1)
# split that line in separate fields populating
# our `real_fields` array
split(real_line, real_fields)
# awk split function creates an array with numeric
# indexes for each field found as per FS separator
}
# if field3 of the current line of the "reference"
# file does not match the current line of the "real" file..
$3!=real_fields[3] {
# print current line of "reference" file
print
# go reading next line of "reference" file thus
# skipping the final awk pattern
next
}
# final awk pattern, we get here only if the pattern
# above did not match, i.e. if field3 values from both
# files match
{
# print current line of "real" file
print real_line
# delete our real_fields array, thus triggering
# the fetching of the next line of "real" file as
# performed by the first awk pattern
delete real_fields
}
答案2
您需要设置数组的顺序,否则 awk 将重新排序您的行。
#!/usr/bin/awk -f
BEGIN {
PROCINFO["sorted_in"] = "@ind_str_asc"
}
NR==FNR {
a[i++,$3]=$0
next
}
{
for (c in a) {
split(c, s, SUBSEP)
if (s[2] == $3) {
print $0
getline
} else {
print a[c]
}
}
}
./script.awk file1 file2