合并两个具有相似语义的非基于 git 的文本文件来描述 git 合并冲突

Question 1

笔记：虽然我认为这是一个有点合理的“答案”，但我现在提出了另一个我认为更好的“答案”。所以请参阅下面我的另一个“答案”。

这个“答案”的原始版本......

哦！我在这里发帖太早了。我不知道的-D命令行选项diff，现在我意识到我可以做到这一点......

diff -D file.1 file.2 >file.merged

它将产生以下内容file.merged...

common line 1 ...
common line 2 ...
common line 3 ...
#ifdef file.1
something unique from file.1
a second line of something unique from file.1
#else /* file.1 */
something unique from file.2
#endif /* file.1 */
common line 4 ...
common line 5 ...
#ifdef file.1
something unique from file.1
#else /* file.1 */
something unique from file.2
a second line of something unique from file.2
#endif /* file.1 */
common line 6 ...
common line 7 ...
... etc. ...

我可以像处理#ifdef、#else和#endif行一样处理git、<<<<<<<<和========行>>>>>>>>。

更新：...我刚刚发现了这个： https://stackoverflow.com/questions/16902001/manually-merge-two-files-using-diff

它展示了我如何也可以使用统一差异格式做类似的事情。给出diff一个-U带有巨大参数的选项，该参数大于file.1和中的最大行数file.2。例如 ...

diff -U 99999999 file.1 file.2 | tail -n +4 >file.merged

然后它会产生这样的结果：

 common line 1 ...
 common line 2 ...
 common line 3 ...
+something unique from file.2
-something unique from file.1
-a second line of something unique from file.1
 common line 4 ...
 common line 5 ...
+something unique from file.2
+a second line of something unique from file.2
-something unique from file.1
 common line 6 ...
 common line 7 ...
 ... etc. ...

线条+代表中的唯一数据file.2，-线条代表中的唯一数据file.1。

我可以处理这些+和-台词。

Answer

笔记：虽然我认为这是一个有点合理的“答案”，但我现在提出了另一个我认为更好的“答案”。所以请参阅下面我的另一个“答案”。

这个“答案”的原始版本......

哦！我在这里发帖太早了。我不知道的-D命令行选项diff，现在我意识到我可以做到这一点......

diff -D file.1 file.2 >file.merged

它将产生以下内容file.merged...

common line 1 ...
common line 2 ...
common line 3 ...
#ifdef file.1
something unique from file.1
a second line of something unique from file.1
#else /* file.1 */
something unique from file.2
#endif /* file.1 */
common line 4 ...
common line 5 ...
#ifdef file.1
something unique from file.1
#else /* file.1 */
something unique from file.2
a second line of something unique from file.2
#endif /* file.1 */
common line 6 ...
common line 7 ...
... etc. ...

我可以像处理#ifdef、#else和#endif行一样处理git、<<<<<<<<和========行>>>>>>>>。

更新：...我刚刚发现了这个： https://stackoverflow.com/questions/16902001/manually-merge-two-files-using-diff

它展示了我如何也可以使用统一差异格式做类似的事情。给出diff一个-U带有巨大参数的选项，该参数大于file.1和中的最大行数file.2。例如 ...

diff -U 99999999 file.1 file.2 | tail -n +4 >file.merged

然后它会产生这样的结果：

 common line 1 ...
 common line 2 ...
 common line 3 ...
+something unique from file.2
-something unique from file.1
-a second line of something unique from file.1
 common line 4 ...
 common line 5 ...
+something unique from file.2
+a second line of something unique from file.2
-something unique from file.1
 common line 6 ...
 common line 7 ...
 ... etc. ...

线条+代表中的唯一数据file.2，-线条代表中的唯一数据file.1。

我可以处理这些+和-台词。

Question 2

听起来您并不真正关心输出格式，而只是想知道如何识别哪些行来自每个文件或哪些行是常见的。给这个，怎么样：

$ diff --old-line-format=$'-%l\n' --new-line-format=$'+%l\n' --unchanged-line-format=$'=%l\n' file.1 file.2
=common line 1 ...
=common line 2 ...
=common line 3 ...
-something unique from file.1
-a second line of something unique from file.1
+something unique from file.2
=common line 4 ...
=common line 5 ...
-something unique from file.1
+something unique from file.2
+a second line of something unique from file.2
=common line 6 ...
=common line 7 ...

警惕任何必须测试行内容以获取该行源指示符的解决方案（例如，如果您正在寻找<<<<<<< file.1告诉您什么是唯一的file1- 如果file包含与该字符串完全相同的行怎么办？ ) 而不是始终且仅出现在每行中唯一位置的指示符，因为如果该字符串可能出现在您的输入中，则对任何字符串的测试都会失败。对于上述内容，第一个字符始终指示该行的来源，因此它不会与可能的文件内容发生冲突。如果你真的想准确地获得 git merge 冲突的输出格式（我不推荐），你总是可以将上面的内容通过管道传输到一个简单的 awk 脚本来打印<<< file或当该行的第一个字符发生变化时你喜欢的任何内容，然后删除那个角色。

Answer

听起来您并不真正关心输出格式，而只是想知道如何识别哪些行来自每个文件或哪些行是常见的。给这个，怎么样：

$ diff --old-line-format=$'-%l\n' --new-line-format=$'+%l\n' --unchanged-line-format=$'=%l\n' file.1 file.2
=common line 1 ...
=common line 2 ...
=common line 3 ...
-something unique from file.1
-a second line of something unique from file.1
+something unique from file.2
=common line 4 ...
=common line 5 ...
-something unique from file.1
+something unique from file.2
+a second line of something unique from file.2
=common line 6 ...
=common line 7 ...

警惕任何必须测试行内容以获取该行源指示符的解决方案（例如，如果您正在寻找<<<<<<< file.1告诉您什么是唯一的file1- 如果file包含与该字符串完全相同的行怎么办？ ) 而不是始终且仅出现在每行中唯一位置的指示符，因为如果该字符串可能出现在您的输入中，则对任何字符串的测试都会失败。对于上述内容，第一个字符始终指示该行的来源，因此它不会与可能的文件内容发生冲突。如果你真的想准确地获得 git merge 冲突的输出格式（我不推荐），你总是可以将上面的内容通过管道传输到一个简单的 awk 脚本来打印<<< file或当该行的第一个字符发生变化时你喜欢的任何内容，然后删除那个角色。

Question 3

由于我最初在第一个“答案”中发布的解决方案的局限性，其中涉及diff -D ...和diff -U ...，我决定使用 python 的difflib模块在 python 中编写一个解决方案。

我编写它是为了产生看起来与git.它使用包含字符串<<<<<<<<、========、和的分隔符>>>>>>>>，正如我们所知，如果原始文本包含这样的字符串，这可能会导致歧义。然而，同样的含糊不清的问题可能存在于的“合并冲突”输出中git，但由于我对此感到满意git并愿意接受它，所以我也对自己的解决方案中的这些含糊不清感到满意。

输出与“合并冲突”输出并不完全相同git，但它足以满足我的愿望。

首先，这是Python程序（我清理了我在这里发布的原始Python代码，这是清理后的版本）。我把这个程序称为filemerge...

#!/usr/bin/python3

### Take the diff's between two files and output
### the common and different lines in a manner
### which is very similar to the way that `git`
### depicts merge conflicts.

import sys
sys.dont_write_bytecode = True

import os

from difflib import unified_diff

prog       = None
diff_start = '<<<<<<<<'
diff_sep   = '========'
diff_end   = '>>>>>>>>'

def main():
    if len(sys.argv) < 3:
        print(f'\nusage: {prog} file1 file2\n')
        return 1

    file1, file2 = sys.argv[1:3]
    data1        = None
    data2        = None
    missing      = []

    try:
        with open(file1, 'r') as f:
            data1 = f.readlines()
    except Exception:
        missing.append(file1)

    try:
        with open(file2, 'r') as f:
            data2 = f.readlines()
    except Exception:
        missing.append(file2)
        
    if missing:
        print(f'\nnot found: {", ".join(missing)}\n')
        return 1

    n1 = len(data1)
    n2 = len(data2)
    max_lines = (n1 + 1) if n1 > n2 else (n2 + 1)
    count = 0
    state = ''
    sep_printed = False
    next_file = ''

    for line in unified_diff(data1, data2, n=max_lines):
        count += 1
        if count < 4:
            continue

        # Every line which is returned by unified_diff()
        # is at least 2 characters long. Each of these
        # lines starts with either ' ', '+', or '-', and
        # each of these lines ends with a newline.
        line = line[:-1]
        ch0  = line[0]

        if ch0 == ' ':
            if state:
                state = ''
                if not sep_printed:
                    print(f'{diff_sep}{next_file}')
                print(diff_end)
            sep_printed = False
            next_file = ''
        elif ch0 == '-':
            if state == ch0:
                pass
            elif state == '+':
                print(f'{diff_sep} file={file1}')
                sep_printed = True
                next_file = ''
            else:
                print(f'{diff_start} file={file1}')
                sep_printed = False
                next_file = f' file={file2}'
            state = ch0
        elif ch0 == '+':
            if state == ch0:
                pass
            elif state == '-':
                print(f'{diff_sep} file={file2}')
                sep_printed = True
                next_file = ''
            else:
                print(f'{diff_start} file={file2}')
                sep_printed = False
                next_file = f' file={file1}'
            state = ch0
        print(line[1:])

    if state:
        if not sep_printed:
            print(f'{diff_sep}{next_file}')
            next_file = ''
        print(diff_end)

    return 0

if __name__ == '__main__':
    prog = os.path.basename(sys.argv[0])
    sys.exit(main())

这是我测试它的输入文件。它们与我最初在问题中发布的输入文件类似，但不完全相同......

========file.1

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.1
a second line of something unique from file.1
common line 4 ...
common line 5 ...
something unique from file.1
common line 6 ...
common line 7 ...
penultimate file.1 line
common line 8 ...

========file.2

common line 1 ...
second line from file.2
common line 2 ...
common line 3 ...
something unique from file.2
common line 4 ...
common line 5 ...
something unique from file.2
a second line of something unique from file.2
common line 6 ...
common line 7 ...
common line 8 ...

我像这样运行命令...

filemerge file.1 file.2 >file.merged

这些是结果内容file.merged......

common line 1 ...
<<<<<<<< file=file.2
second line from file.2
======== file=file.1
>>>>>>>>
common line 2 ...
common line 3 ...
<<<<<<<< file=file.1
something unique from file.1
a second line of something unique from file.1
======== file=file.2
something unique from file.2
>>>>>>>>
common line 4 ...
common line 5 ...
<<<<<<<< file=file.1
something unique from file.1
======== file=file.2
something unique from file.2
a second line of something unique from file.2
>>>>>>>>
common line 6 ...
common line 7 ...
<<<<<<<< file=file.1
penultimate file.1 line
======== file=file.2
>>>>>>>>
common line 8 ...

正如我所提到的，这与的“合并冲突”输出的格式并不完全相同git，但它非常相似，对我来说已经足够接近了。

Answer