匹配并打印两个文件中的多列

匹配并打印两个文件中的多列

我有两个文件,我需要做的是根据两个文件中的 column1 查找公共行,如果匹配,则写入新文件 col1(两个文件之间公共) file1col2 和 file2col2

文件1:

col1                         file1col2
10:100000525-100001560(+)    0.971465226620556
10:100001724-100002618(+)    0.940918504451204
10:100002725-100002970(+)    0.946592696189412
10:100003104-100004184(+)    0.736305487299153
10:100004450-100005051(+)    0.70823022283736
10:100005158-100005876(+)    0.969728923411704
10:100006075-100007551(+)    0.855411430976336
10:100007764-100009009(+)    0.274219271261146
10:100009146-100011362(+)    0.927057564779308
10:100011583-100011887(+)    0.883431738847249

文件2

col1                         file2col2
10:100000525-100001560(+)    0.943385996874889
10:100001724-100002618(+)    0.981929023174133
10:100002725-100002970(+)    0.955549170283206
10:100003104-100004184(+)    0.736440826679551
10:100004450-100005051(+)    0.689045711238636
10:100005158-100005876(+)    0.964995337925152
10:100006075-100007551(+)    0.873411848029685
10:100007764-100009009(+)    0.37719743446494
10:100009146-100011362(+)    0.943862343124518
10:100011583-100011887(+)    0.902915705720447

期望的输出

col1(common between two files)  file1col2   file2col2
10:100000525-100001560(+)   0.971465227 0.943385997
10:100001724-100002618(+)   0.940918504 0.981929023
10:100002725-100002970(+)   0.946592696 0.95554917
10:100003104-100004184(+)   0.736305487 0.736440827
10:100004450-100005051(+)   0.708230223 0.689045711
10:100005158-100005876(+)   0.969728923 0.964995338
10:100006075-100007551(+)   0.855411431 0.873411848
10:100007764-100009009(+)   0.274219271 0.377197434
10:100009146-100011362(+)   0.927057565 0.943862343
10:100011583-100011887(+)   0.883431739 0.902915706

答案1

加入+awk解决方案:

join --header file1 file2 | awk 'NR>1{ $2=sprintf("%1.9f",$2); $3=sprintf("%.9f",$3) }1' > result.txt

cat result.txt
col1 file1col2 file2col2
10:100000525-100001560(+) 0.971465227 0.943385997
10:100001724-100002618(+) 0.940918504 0.981929023
10:100002725-100002970(+) 0.946592696 0.955549170
10:100003104-100004184(+) 0.736305487 0.736440827
10:100004450-100005051(+) 0.708230223 0.689045711
10:100005158-100005876(+) 0.969728923 0.964995338
10:100006075-100007551(+) 0.855411431 0.873411848
10:100007764-100009009(+) 0.274219271 0.377197434
10:100009146-100011362(+) 0.927057565 0.943862343
10:100011583-100011887(+) 0.883431739 0.902915706

细节

  • 加入 --header选项 - 将每个文件中的第一行视为字段标题,打印它们而不尝试将它们配对

  • NR>1- 从第二条记录开始处理(NR- 当前记录的编号),即 - 跳过标头线

  • sprintf("%1.9f",$2)- 将参数$2(第二列)格式化为带有 9 位小数的浮点数

相关内容