使用字符串匹配在两个文件之间复制文本行

使用字符串匹配在两个文件之间复制文本行

寻求帮助,我有两个文件,一个是各种名称的大列表,另一个是坐标。这两个文件每行都以 8 位代码开头。我想从 File1 中查找 8 位行代码,并将行内容复制到 File2 中所有匹配的行代码。

(文件 1)仅出现一次哈希/名称。

136667ED ap1_01_a_ap1_01_rails_07
035B337C ap1_01_a_arrows_005
79546F82 ap1_01_a_centreline_010
0E1D31E7 prop_bush_med_02

(文件2)有些有多个哈希副本,例如0E1D31E7,具有不同的坐标。

136667ED -1294.6945,-2376.0317,21.8279
035B337C -1314.6719,-2721.7378,12.9467
79546F82 -1283.1066,-2529.9771,12.9635
0E1D31E7 1919.4160,-1814.3889,160.5210
0E1D31E7 1919.9885,-2628.2529,0.7537  
0E1D31E7 192.0235,-2603.1790,4.9978   
0E1D31E7 192.1050,4950.3540,389.4736

下面是我想要的方式,将 8 位代码/名称复制到文件 2 中的任何代码行匹配中。

136667ED -1294.6945,-2376.0317,21.8279  136667ED ap1_01_a_ap1_01_rails_07  
035B337C -1314.6719,-2721.7378,12.9467  035B337C ap1_01_a_arrows_005       
79546F82 -1283.1066,-2529.9771,12.9635  79546F82 ap1_01_a_centreline_010   
0E1D31E7 1919.4160,-1814.3889,160.5210  0E1D31E7 prop_bush_med_02          
0E1D31E7 1919.9885,-2628.2529,0.7537    0E1D31E7 prop_bush_med_02          
0E1D31E7 192.0235,-2603.1790,4.9978     0E1D31E7 prop_bush_med_02          
0E1D31E7 192.1050,4950.3540,389.4736    0E1D31E7 prop_bush_med_02          

连接具有重复开头的文本行

这可能有效,我不知道如何运行这些命令。我正在使用窗户。

答案1

根据您的输入,以下是使用paste标准输出命令的输出:

$ paste File1 File2
136667ED ap1_01_a_ap1_01_rails_07   136667ED -1294.6945,-2376.0317,21.8279
035B337C ap1_01_a_arrows_005    035B337C -1314.6719,-2721.7378,12.9467
79546F82 ap1_01_a_centreline_010    79546F82 -1283.1066,-2529.9771,12.9635

由于您有大文件,您可以考虑使用paste File1 File2 > mergedfile.

答案2

使用join能够理解进程替换的 shell:

$ join <(sort file2) <(sort file1)
035B337C -1314.6719,-2721.7378,12.9467 ap1_01_a_arrows_005
0E1D31E7 1919.4160,-1814.3889,160.5210 prop_bush_med_02
0E1D31E7 1919.9885,-2628.2529,0.7537 prop_bush_med_02
0E1D31E7 192.0235,-2603.1790,4.9978 prop_bush_med_02
0E1D31E7 192.1050,4950.3540,389.4736 prop_bush_med_02
136667ED -1294.6945,-2376.0317,21.8279 ap1_01_a_ap1_01_rails_07
79546F82 -1283.1066,-2529.9771,12.9635 ap1_01_a_centreline_010

或者,复制输出中间的连接字段:

$ join -o0,1.2,0,2.2 <(sort file2) <(sort file1)
035B337C -1314.6719,-2721.7378,12.9467 035B337C ap1_01_a_arrows_005
0E1D31E7 1919.4160,-1814.3889,160.5210 0E1D31E7 prop_bush_med_02
0E1D31E7 1919.9885,-2628.2529,0.7537 0E1D31E7 prop_bush_med_02
0E1D31E7 192.0235,-2603.1790,4.9978 0E1D31E7 prop_bush_med_02
0E1D31E7 192.1050,4950.3540,389.4736 0E1D31E7 prop_bush_med_02
136667ED -1294.6945,-2376.0317,21.8279 136667ED ap1_01_a_ap1_01_rails_07
79546F82 -1283.1066,-2529.9771,12.9635 79546F82 ap1_01_a_centreline_010

对于更好的列:

$ join -o0,1.2,0,2.2 <(sort file2) <(sort file1) | column -t
035B337C  -1314.6719,-2721.7378,12.9467  035B337C  ap1_01_a_arrows_005
0E1D31E7  1919.4160,-1814.3889,160.5210  0E1D31E7  prop_bush_med_02
0E1D31E7  1919.9885,-2628.2529,0.7537    0E1D31E7  prop_bush_med_02
0E1D31E7  192.0235,-2603.1790,4.9978     0E1D31E7  prop_bush_med_02
0E1D31E7  192.1050,4950.3540,389.4736    0E1D31E7  prop_bush_med_02
136667ED  -1294.6945,-2376.0317,21.8279  136667ED  ap1_01_a_ap1_01_rails_07
79546F82  -1283.1066,-2529.9771,12.9635  79546F82  ap1_01_a_centreline_010

相关内容