合并两个文件并进行比较

Question 1

严格地说，我必须同意这个问题处于纯粹程序性问题的边缘。

同时：太诱人和具有挑战性，作为中间地带，不是来回答，我们之前也回答过类似问题。

剧本

#!/usr/bin/env python3import sys
import sys

files = [[l.split() for l in open(f).readlines()] for f in [sys.argv[1], sys.argv[2]]]
for item in files[0]:
    match = [line for line in files[1] if item[:2] == line[:2]]
    if match:
        try:
            calc = abs(int(int(item[2]) - int(match[0][2])))
            print(("\t").join(item[:3])+"\t"+match[0][2]+"\t", calc)
        except TypeError:
            pass

如何使用

将脚本复制到一个空文件中，另存为analyze.py

使用两个文件作为参数来运行它：

python3 /path/to/analyze.py <file1> <file2>

从你问题中的例子来看：

$ python3 '/home/jacob/Bureaublad/pscript_1.py' '/home/jacob/Bureaublad/map/f2' '/home/jacob/Bureaublad/map/f1' 
Ab  Cd  150 100  50

解释

剧本：

在两个文件中查找以下行：前两列匹配：

for item in files[0]:
    match = [line for line in files[1] if item[:2] == line[:2]]

匹配行的前两列（匹配）以及两个版本的第三柱子。

if match:
    try:
        calc = abs(int(int(item[2]) - int(match[0][2])))
        print(("\t").join(item[:3])+"\t"+match[0][2]+"\t", calc)
    except TypeError:
        pass

最后两列的（绝对）差值在以下行中计算（并最终打印）：
```
calc = abs(int(int(item[2]) - int(match[0][2])))
```

该脚本假设：

所有数字都是整数
你不想打印出不要有匹配的前两列
每条线路只有一个可能的匹配在另一个文件中

Answer

严格地说，我必须同意这个问题处于纯粹程序性问题的边缘。

同时：太诱人和具有挑战性，作为中间地带，不是来回答，我们之前也回答过类似问题。

剧本

#!/usr/bin/env python3import sys
import sys

files = [[l.split() for l in open(f).readlines()] for f in [sys.argv[1], sys.argv[2]]]
for item in files[0]:
    match = [line for line in files[1] if item[:2] == line[:2]]
    if match:
        try:
            calc = abs(int(int(item[2]) - int(match[0][2])))
            print(("\t").join(item[:3])+"\t"+match[0][2]+"\t", calc)
        except TypeError:
            pass

如何使用

将脚本复制到一个空文件中，另存为analyze.py

使用两个文件作为参数来运行它：

python3 /path/to/analyze.py <file1> <file2>

从你问题中的例子来看：

$ python3 '/home/jacob/Bureaublad/pscript_1.py' '/home/jacob/Bureaublad/map/f2' '/home/jacob/Bureaublad/map/f1' 
Ab  Cd  150 100  50

解释

剧本：

在两个文件中查找以下行：前两列匹配：

for item in files[0]:
    match = [line for line in files[1] if item[:2] == line[:2]]

匹配行的前两列（匹配）以及两个版本的第三柱子。

if match:
    try:
        calc = abs(int(int(item[2]) - int(match[0][2])))
        print(("\t").join(item[:3])+"\t"+match[0][2]+"\t", calc)
    except TypeError:
        pass

最后两列的（绝对）差值在以下行中计算（并最终打印）：
```
calc = abs(int(int(item[2]) - int(match[0][2])))
```

该脚本假设：

所有数字都是整数
你不想打印出不要有匹配的前两列
每条线路只有一个可能的匹配在另一个文件中

Question 2

使用join：

join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 file2

column4如果需要在第四个字段中报告标题：

join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 <(awk 'NR == 1 {$3 = "column4"} 1' file2)

join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 <(
    awk '
        NR == 1 {
            $3 = "column4"
        }
        1
    ' file2
)

假设file1和都file2按字段 #1 排序，如示例中所示。

--header：将每个文件的第一行视为字段标题，打印它们而不尝试将它们配对
-j1：连接的字段 #1file1和的字段 #1file2
-a1：还打印文件中不成对的行file1
-o 1.1,1.2,1.3,2.3：打印字段 #1、#2 和 #3，file1然后打印字段 #3file2

% cat file1
column1  column2  column3  column4
Ab       Cd       100      Us
Ef       Gh       200      Us
% cat file2
column1  column2  column3  column4
Ab       Cd       150      Us
% join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 file2 
column1 column2 column3 column3
Ab Cd 100 150
Ef Gh 200 
% join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 <(awk 'NR == 1 {$3 = "column4"} 1' file2)
column1 column2 column3 column4
Ab Cd 100 150
Ef Gh 200

Answer

使用join：

join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 file2

column4如果需要在第四个字段中报告标题：

join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 <(awk 'NR == 1 {$3 = "column4"} 1' file2)

join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 <(
    awk '
        NR == 1 {
            $3 = "column4"
        }
        1
    ' file2
)

假设file1和都file2按字段 #1 排序，如示例中所示。

--header：将每个文件的第一行视为字段标题，打印它们而不尝试将它们配对
-j1：连接的字段 #1file1和的字段 #1file2
-a1：还打印文件中不成对的行file1
-o 1.1,1.2,1.3,2.3：打印字段 #1、#2 和 #3，file1然后打印字段 #3file2

% cat file1
column1  column2  column3  column4
Ab       Cd       100      Us
Ef       Gh       200      Us
% cat file2
column1  column2  column3  column4
Ab       Cd       150      Us
% join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 file2 
column1 column2 column3 column3
Ab Cd 100 150
Ef Gh 200 
% join --header -j1 -a1 -o 1.1,1.2,1.3,2.3 file1 <(awk 'NR == 1 {$3 = "column4"} 1' file2)
column1 column2 column3 column4
Ab Cd 100 150
Ef Gh 200

合并两个文件并进行比较

答案1

剧本

如何使用

解释

答案2

相关内容