比较两个文件中的字符串

Question 1

您可以使用 join 命令

join -t ":" username contacts

用户名文件的格式为

user1:id1
user2:id2

联系人的格式为

user1:contact1
user2:contact2

当文件未排序时，您可以执行以下操作

sort -b username > username.sorted
sort -b contacts > contacts.sorted

然后在 username.sorted 和 contacts.sorted 上运行 join 命令

或者另一个邮政指出你可以直接使用

join -t ":" <(sort -b username) <(sort -b contacts)

Answer

您可以使用 join 命令

join -t ":" username contacts

用户名文件的格式为

user1:id1
user2:id2

联系人的格式为

user1:contact1
user2:contact2

当文件未排序时，您可以执行以下操作

sort -b username > username.sorted
sort -b contacts > contacts.sorted

然后在 username.sorted 和 contacts.sorted 上运行 join 命令

或者另一个邮政指出你可以直接使用

join -t ":" <(sort -b username) <(sort -b contacts)

Question 2

在 Python 脚本中：

务实的解决方案

如果这是针对某一特定情况的“一次性工作”，则可进行以下操作：

#!/usr/bin/env python3

with open(file1) as names:
    names = sorted(names.readlines())
with open(file2) as data:
    data = data.readlines()
for i in names:
    item = i.replace("\n", "")+str([d[d.find(":"):].replace("\n", "") for d in data if d.startswith(i.split(":")[0])][0])
    print(item)

输出：

Neeraj:149:[email protected]
Rahul:148:[email protected]
Tarun:143:[email protected]

或者，如果您想将输出直接保存到文件中：

#!/usr/bin/env python3

with open(file1) as names:
    names = sorted(names.readlines())
with open(file2) as data:
    data = data.readlines()
with open(file3, "wt") as output:
    for i in names:
        output.write(i.replace("\n", "")+str([d[d.find(":"):].replace("\n", "") for d in data if d.startswith(i.split(":")[0])][0])+"\n")

您可能已经知道，将脚本复制到一个空文件中，将路径设置为文件 1-2（3）（引号之间），将其保存为combine.py，然后通过以下命令运行它：

python3 /path/to/combine.py

更适合数据库的解决方案

查看这两个文件，我们实际上是在处理数据库，第一个字段是关键。以下脚本更加灵活，涵盖了更灵活的生成这两个文件报告的方式，例如，在我们有比这里更多的字段的情况下。

如果我们在第二个文件中添加一个额外的（“特征”）字段：

Neeraj:[email protected]:Loves to Cook
Rahul:[email protected]:Collects empty bottles
Tarun:[email protected]:Weares his glasses upside down

我们可能想要添加特征而不是电子邮件地址，或者两者兼而有之。这将需要如下脚本：

#!/usr/bin/env python3

db1 = "/path/to/file1"; db2 = "/path/to/file2"

with open(db1) as data1:
    rc = [l.replace("\n", "").split(":") for l in data1.readlines()]

with open(db2) as data2:
    records2 = [l.replace("\n", "").split(":") for l in data2.readlines()]

uniques = sorted(set(item[0] for item in rc)) # find keys
report = []

for i in uniques:
    database_1 = [r for r in rc if r[0] == i][0]
    database_2 = [r for r in records2 if r[0] == i][0]
    # -----------------------------------------------------------------------
    # set the required fields for report here:
    new_record = i, database_1[1], database_2[1]
    # -----------------------------------------------------------------------
    report.append((":").join(new_record))

for item in report:
    print(item)

结果

如果我们设置：

new_record = i, database_1[1], database_2[2]

结果是：

Neeraj:149:Loves to Cook
Rahul:148:Collects empty bottles
Tarun:143:Weares his glasses upside down

但如果我们设置：

new_record = i, database_1[1], database_2[1]

结果是：

Neeraj:149:[email protected]
Rahul:148:[email protected]
Tarun:143:[email protected]

如果我们设置：

new_record = i, database_1[1], database_2[1], database_2[2]

结果是：

Neeraj:149:[email protected]:Loves to Cook
Rahul:148:[email protected]:Collects empty bottles
Tarun:143:[email protected]:Weares his glasses upside down

Answer

在 Python 脚本中：

务实的解决方案

如果这是针对某一特定情况的“一次性工作”，则可进行以下操作：

#!/usr/bin/env python3

with open(file1) as names:
    names = sorted(names.readlines())
with open(file2) as data:
    data = data.readlines()
for i in names:
    item = i.replace("\n", "")+str([d[d.find(":"):].replace("\n", "") for d in data if d.startswith(i.split(":")[0])][0])
    print(item)

输出：

Neeraj:149:[email protected]
Rahul:148:[email protected]
Tarun:143:[email protected]

或者，如果您想将输出直接保存到文件中：

#!/usr/bin/env python3

with open(file1) as names:
    names = sorted(names.readlines())
with open(file2) as data:
    data = data.readlines()
with open(file3, "wt") as output:
    for i in names:
        output.write(i.replace("\n", "")+str([d[d.find(":"):].replace("\n", "") for d in data if d.startswith(i.split(":")[0])][0])+"\n")

您可能已经知道，将脚本复制到一个空文件中，将路径设置为文件 1-2（3）（引号之间），将其保存为combine.py，然后通过以下命令运行它：

python3 /path/to/combine.py

更适合数据库的解决方案

查看这两个文件，我们实际上是在处理数据库，第一个字段是关键。以下脚本更加灵活，涵盖了更灵活的生成这两个文件报告的方式，例如，在我们有比这里更多的字段的情况下。

如果我们在第二个文件中添加一个额外的（“特征”）字段：

Neeraj:[email protected]:Loves to Cook
Rahul:[email protected]:Collects empty bottles
Tarun:[email protected]:Weares his glasses upside down

我们可能想要添加特征而不是电子邮件地址，或者两者兼而有之。这将需要如下脚本：

#!/usr/bin/env python3

db1 = "/path/to/file1"; db2 = "/path/to/file2"

with open(db1) as data1:
    rc = [l.replace("\n", "").split(":") for l in data1.readlines()]

with open(db2) as data2:
    records2 = [l.replace("\n", "").split(":") for l in data2.readlines()]

uniques = sorted(set(item[0] for item in rc)) # find keys
report = []

for i in uniques:
    database_1 = [r for r in rc if r[0] == i][0]
    database_2 = [r for r in records2 if r[0] == i][0]
    # -----------------------------------------------------------------------
    # set the required fields for report here:
    new_record = i, database_1[1], database_2[1]
    # -----------------------------------------------------------------------
    report.append((":").join(new_record))

for item in report:
    print(item)

结果

如果我们设置：

new_record = i, database_1[1], database_2[2]

结果是：

Neeraj:149:Loves to Cook
Rahul:148:Collects empty bottles
Tarun:143:Weares his glasses upside down

但如果我们设置：

new_record = i, database_1[1], database_2[1]

结果是：

Neeraj:149:[email protected]
Rahul:148:[email protected]
Tarun:143:[email protected]

如果我们设置：

new_record = i, database_1[1], database_2[1], database_2[2]

结果是：

Neeraj:149:[email protected]:Loves to Cook
Rahul:148:[email protected]:Collects empty bottles
Tarun:143:[email protected]:Weares his glasses upside down

Question 3

和流程替代在中bash，我们可以制作一个非常紧凑的解决方案变体join，即使对于未分类输入文件：

join -t: <(sort user-name) <(sort user-details)

输出就像问题中的示例输出一样：

Neeraj:149:[email protected]
Rahul:148:[email protected]
Tarun:143:[email protected]

我们在这里使用两个文件的第一个字段/列。要使用其他列，请使用选项-1和-2（或-j如果是同一个字段）。为了更明确，我们可以使用上面的join -t: -j 1 ...或join -t: -1 1 -2 1 ...。（另请参阅man join）

表单的各个部分<(command)被命名管道替换，可以从中读取命令的输出。这意味着对于join命令来说，它将获取两个文件，并以排序的输入作为参数。

（看man bash | less '+/Process Substitution'）

Answer

和流程替代在中bash，我们可以制作一个非常紧凑的解决方案变体join，即使对于未分类输入文件：

join -t: <(sort user-name) <(sort user-details)

输出就像问题中的示例输出一样：

Neeraj:149:[email protected]
Rahul:148:[email protected]
Tarun:143:[email protected]

我们在这里使用两个文件的第一个字段/列。要使用其他列，请使用选项-1和-2（或-j如果是同一个字段）。为了更明确，我们可以使用上面的join -t: -j 1 ...或join -t: -1 1 -2 1 ...。（另请参阅man join）

表单的各个部分<(command)被命名管道替换，可以从中读取命令的输出。这意味着对于join命令来说，它将获取两个文件，并以排序的输入作为参数。

（看man bash | less '+/Process Substitution'）

Question 4

尝试一下我的代码：

首先对user-name和进行排序，contacts并将输出写入一个名为的文件中user-name_contacts：

sort user-name contacts > user-name_contacts

接下来，运行此命令来合并两个文件：

sed -i '/$/N ; s/\n\(.*\):/:/' user-name_contacts

输出：

尼拉杰：149：[电子邮件保护]
 拉胡尔：148：[电子邮件保护]
 塔伦：143：[电子邮件保护]

Answer

尝试一下我的代码：

首先对user-name和进行排序，contacts并将输出写入一个名为的文件中user-name_contacts：

sort user-name contacts > user-name_contacts

接下来，运行此命令来合并两个文件：

sed -i '/$/N ; s/\n\(.*\):/:/' user-name_contacts

输出：

尼拉杰：149：[电子邮件保护]
 拉胡尔：148：[电子邮件保护]
 塔伦：143：[电子邮件保护]

比较两个文件中的字符串

答案1

答案2

答案3

答案4

相关内容