我正在尝试从一个文件获取域的带宽,并从另一个文件获取其点击详细信息。
两个文件的格式如下:
带宽.txt
aadrivingschool.ws 2840.36M aaspak.org 211.57M aasteknik.com 1419.26M aatonerpk.com 14.87M
命中.txt:
onlinestudyboard.com received 186 hits from 31/May/2016 at 1201 aaspak.org received 184 hits from 31/May/2016 at 1202 khawajarubber.com received 183 hits from 31/May/2016 at 1246 aatonerpk.com received 182 hits from 31/May/2016 at 1231
我想要得到的是:
onlinestudyboard.com received 186 hits from 31/May/2016 at 1201
aaspak.org received 184 hits from 31/May/2016 at 1202 211.57M
khawajarubber.com received 183 hits from 31/May/2016 at 1246
aatonerpk.com received 182 hits from 31/May/2016 at 1231 14.87M
答案1
这是一种awk
方法:
$ awk 'FNR==NR{a[$1]=$2; next} {print $0,a[$1]}' bandwidth.txt hits.txt
onlinestudyboard.com received 186 hits from 31/May/2016 at 1201
aaspak.org received 184 hits from 31/May/2016 at 1202 211.57M
khawajarubber.com received 183 hits from 31/May/2016 at 1246
aatonerpk.com received 182 hits from 31/May/2016 at 1231 14.87M
解释
awk
逐行读取其输入文件,并将该行拆分为空格处的字段(或 给出的任何内容-F
)。这些字段就是$1
,$2 ...
$N`。
NR==FNR
:NR为当前行号,FNR为当前文件的行号。仅当读取第一个文件时,两者才会相同。a[$1]=$2; next
:如果这是第一个文件(见上文),则将第二个字段保存在一个数组中,该数组的键是第一个字段。print $0,a[$1]
:打印当前行 ( ) 以及与第一个字段关联的数组$0
中的值。a
这将打印第二个文件的当前行以及第一个文件中与其第一个字段关联的任何值。
答案2
和种类编辑 加入,柱子和一些巴什主义:
join -a 1 <(sort hits.txt) <(sort bandwidth.txt) | column -t | sort -nrk3
需要三种类型。两个是因为输入文件不是按公共字段排序的,但是加入需要排序的输入。其他种类需要(第三个字段,反向数字)来恢复 OP 排序顺序。
输出:
onlinestudyboard.com received 186 hits from 31/May/2016 at 1201
aaspak.org received 184 hits from 31/May/2016 at 1202 211.57M
khawajarubber.com received 183 hits from 31/May/2016 at 1246
aatonerpk.com received 182 hits from 31/May/2016 at 1231 14.87M
当输出顺序不重要并且输入文件是预分类:
join -a 1 hits.txt bandwidth.txt | column -t