比较两个文件中的字符串并合并其输出

比较两个文件中的字符串并合并其输出

我正在尝试从一个文件获取域的带宽,并从另一个文件获取其点击详细信息。

两个文件的格式如下:

  1. 带宽.txt

    aadrivingschool.ws       2840.36M
    aaspak.org               211.57M
    aasteknik.com            1419.26M
    aatonerpk.com            14.87M
    
  2. 命中.txt:

    onlinestudyboard.com   received  186     hits  from  31/May/2016  at  1201
    aaspak.org             received  184     hits  from  31/May/2016  at  1202
    khawajarubber.com      received  183     hits  from  31/May/2016  at  1246
    aatonerpk.com          received  182     hits  from  31/May/2016  at  1231
    

我想要得到的是:

onlinestudyboard.com       received  186     hits  from  31/May/2016  at  1201 
aaspak.org                 received  184     hits  from  31/May/2016  at  1202  211.57M
khawajarubber.com          received  183     hits  from  31/May/2016  at  1246
aatonerpk.com              received  182     hits  from  31/May/2016  at  1231  14.87M

答案1

这是一种awk方法:

$ awk 'FNR==NR{a[$1]=$2; next} {print $0,a[$1]}' bandwidth.txt hits.txt
onlinestudyboard.com     received  186     hits  from  31/May/2016  at  1201 
aaspak.org               received  184     hits  from  31/May/2016  at  1202 211.57M
khawajarubber.com        received  183     hits  from  31/May/2016  at  1246 
aatonerpk.com            received  182     hits  from  31/May/2016  at  1231 14.87M

解释

awk逐行读取其输入文件,并将该行拆分为空格处的字段(或 给出的任何内容-F)。这些字段就是$1,$2 ...$N`。

  • NR==FNR:NR为当前行号,FNR为当前文件的行号。仅当读取第一个文件时,两者才会相同。
  • a[$1]=$2; next:如果这是第一个文件(见上文),则将第二个字段保存在一个数组中,该数组的键是第一个字段。
  • print $0,a[$1]:打印当前行 ( ) 以及与第一个字段关联的数组$0中的值。a这将打印第二个文件的当前行以及第一个文件中与其第一个字段关联的任何值。

答案2

种类编辑 加入,柱子和一些巴什主义:

join -a 1 <(sort hits.txt) <(sort bandwidth.txt) | column -t | sort -nrk3

需要三种类型。两个是因为输入文件不是按公共字段排序的,但是加入需要排序的输入。其他种类需要(第三个字段,反向数字)来恢复 OP 排序顺序。

输出:

onlinestudyboard.com  received  186  hits  from  31/May/2016  at  1201
aaspak.org            received  184  hits  from  31/May/2016  at  1202  211.57M
khawajarubber.com     received  183  hits  from  31/May/2016  at  1246
aatonerpk.com         received  182  hits  from  31/May/2016  at  1231  14.87M

当输出顺序不重要并且输入文件预分类:

join -a 1 hits.txt bandwidth.txt | column -t

相关内容