合并两个具有相似语义的非基于 git 的文本文件来描述 git 合并冲突

合并两个具有相似语义的非基于 git 的文本文件来描述 git 合并冲突

我想git使用类似于git描述“合并冲突”的语义来合并两个非基于的文本文件。

例如,假设我有两个内容相似但不相同的文本文件,名为file.1file.2。我想将这两个文件合并到第三个文件中,如下所示:

hypothetical-merge-utility file.1 file.2 file.merged

我希望它生成file.merged,它将以类似于以下的方式列出文件内容和每个差异:

common line 1 ...
common line 2 ...
common line 3 ...
<<<<<<< file.1
something unique from file.1
a second line of something unique from file.1
======= file.2
something unique from file.2
>>>>>>> end of diff
common line 4 ...
common line 5 ...
<<<<<<< file.1
something unique from file.1
======= file.2
something unique from file.2
a second line of something unique from file.2
>>>>>>> end of diff
common line 6 ...
common line 7 ...
... etc. ...

换句话说,我希望file.1和之间的每个差异file.2看起来类似于“合并冲突”的表示git

我不在乎是否使用除<<<<<<<<, ========, 和之外的分隔符。>>>>>>>>

我知道 Linux 下有许多可用于合并文本文件的实用程序。然而,我是仅有的git寻找以类似于描述“合并冲突”的方式专门呈现合并数据的内容。

有谁知道这样的实用程序吗?

先感谢您。

更新:根据 Ed Morton 提出的以下问题,以下是两个测试文件的内容......

==== 文件.1 ====

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.1
a second line of something unique from file.1
common line 4 ...
common line 5 ...
something unique from file.1
common line 6 ...
common line 7 ...

==== 文件.2 ====

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.2
common line 4 ...
common line 5 ...
something unique from file.2
a second line of something unique from file.2
common line 6 ...
common line 7 ...

答案1

笔记:虽然我认为这是一个有点合理的“答案”,但我现在提出了另一个我认为更好的“答案”。所以请参阅下面我的另一个“答案”。

这个“答案”的原始版本......

哦!我在这里发帖太早了。我不知道 的-D命令行选项diff,现在我意识到我可以做到这一点......

diff -D file.1 file.2 >file.merged

它将产生以下内容file.merged...

common line 1 ...
common line 2 ...
common line 3 ...
#ifdef file.1
something unique from file.1
a second line of something unique from file.1
#else /* file.1 */
something unique from file.2
#endif /* file.1 */
common line 4 ...
common line 5 ...
#ifdef file.1
something unique from file.1
#else /* file.1 */
something unique from file.2
a second line of something unique from file.2
#endif /* file.1 */
common line 6 ...
common line 7 ...
... etc. ...

我可以像处理#ifdef#else#endif行一样处理git<<<<<<<<========>>>>>>>>

更新:...我刚刚发现了这个: https://stackoverflow.com/questions/16902001/manually-merge-two-files-using-diff

它展示了我如何也可以使用统一差异格式做类似的事情。给出diff一个-U带有巨大参数的选项,该参数大于file.1和中的最大行数file.2。例如 ...

diff -U 99999999 file.1 file.2 | tail -n +4 >file.merged

然后它会产生这样的结果:

 common line 1 ...
 common line 2 ...
 common line 3 ...
+something unique from file.2
-something unique from file.1
-a second line of something unique from file.1
 common line 4 ...
 common line 5 ...
+something unique from file.2
+a second line of something unique from file.2
-something unique from file.1
 common line 6 ...
 common line 7 ...
 ... etc. ...

线条+代表 中的唯一数据file.2-线条代表 中的唯一数据file.1

我可以处理这些+-台词。

答案2

听起来您并不真正关心输出格式,而只是想知道如何识别哪些行来自每个文件或哪些行是常见的。给这个,怎么样:

$ diff --old-line-format=$'-%l\n' --new-line-format=$'+%l\n' --unchanged-line-format=$'=%l\n' file.1 file.2
=common line 1 ...
=common line 2 ...
=common line 3 ...
-something unique from file.1
-a second line of something unique from file.1
+something unique from file.2
=common line 4 ...
=common line 5 ...
-something unique from file.1
+something unique from file.2
+a second line of something unique from file.2
=common line 6 ...
=common line 7 ...

警惕任何必须测试行内容以获取该行源指示符的解决方案(例如,如果您正在寻找<<<<<<< file.1告诉您什么是唯一的file1- 如果file包含与该字符串完全相同的行怎么办? ) 而不是始终且仅出现在每行中唯一位置的指示符,因为如果该字符串可能出现在您的输入中,则对任何字符串的测试都会失败。对于上述内容,第一个字符始终指示该行的来源,因此它不会与可能的文件内容发生冲突。如果你真的想准确地获得 git merge 冲突的输出格式(我不推荐),你总是可以将上面的内容通过管道传输到一个简单的 awk 脚本来打印<<< file或当该行的第一个字符发生变化时你喜欢的任何内容,然后删除那个角色。

答案3

由于我最初在第一个“答案”中发布的解决方案的局限性,其中涉及diff -D ...diff -U ...,我决定使用 python 的difflib模块在 python 中编写一个解决方案。

我编写它是为了产生看起来与git.它使用包含字符串<<<<<<<<========、 和 的分隔符>>>>>>>>,正如我们所知,如果原始文本包含这样的字符串,这可能会导致歧义。然而,同样的含糊不清的问题可能存在于 的“合并冲突”输出中git,但由于我对此感到满意git并愿意接受它,所以我也对自己的解决方案中的这些含糊不清感到满意。

输出与“合并冲突”输出并不完全相同git,但它足以满足我的愿望。

首先,这是Python程序(我清理了我在这里发布的原始Python代码,这是清理后的版本)。我把这个程序称为filemerge...

#!/usr/bin/python3

### Take the diff's between two files and output
### the common and different lines in a manner
### which is very similar to the way that `git`
### depicts merge conflicts.

import sys
sys.dont_write_bytecode = True

import os

from difflib import unified_diff

prog       = None
diff_start = '<<<<<<<<'
diff_sep   = '========'
diff_end   = '>>>>>>>>'

def main():
    if len(sys.argv) < 3:
        print(f'\nusage: {prog} file1 file2\n')
        return 1

    file1, file2 = sys.argv[1:3]
    data1        = None
    data2        = None
    missing      = []

    try:
        with open(file1, 'r') as f:
            data1 = f.readlines()
    except Exception:
        missing.append(file1)

    try:
        with open(file2, 'r') as f:
            data2 = f.readlines()
    except Exception:
        missing.append(file2)
        
    if missing:
        print(f'\nnot found: {", ".join(missing)}\n')
        return 1

    n1 = len(data1)
    n2 = len(data2)
    max_lines = (n1 + 1) if n1 > n2 else (n2 + 1)
    count = 0
    state = ''
    sep_printed = False
    next_file = ''

    for line in unified_diff(data1, data2, n=max_lines):
        count += 1
        if count < 4:
            continue

        # Every line which is returned by unified_diff()
        # is at least 2 characters long. Each of these
        # lines starts with either ' ', '+', or '-', and
        # each of these lines ends with a newline.
        line = line[:-1]
        ch0  = line[0]

        if ch0 == ' ':
            if state:
                state = ''
                if not sep_printed:
                    print(f'{diff_sep}{next_file}')
                print(diff_end)
            sep_printed = False
            next_file = ''
        elif ch0 == '-':
            if state == ch0:
                pass
            elif state == '+':
                print(f'{diff_sep} file={file1}')
                sep_printed = True
                next_file = ''
            else:
                print(f'{diff_start} file={file1}')
                sep_printed = False
                next_file = f' file={file2}'
            state = ch0
        elif ch0 == '+':
            if state == ch0:
                pass
            elif state == '-':
                print(f'{diff_sep} file={file2}')
                sep_printed = True
                next_file = ''
            else:
                print(f'{diff_start} file={file2}')
                sep_printed = False
                next_file = f' file={file1}'
            state = ch0
        print(line[1:])

    if state:
        if not sep_printed:
            print(f'{diff_sep}{next_file}')
            next_file = ''
        print(diff_end)

    return 0

if __name__ == '__main__':
    prog = os.path.basename(sys.argv[0])
    sys.exit(main())

这是我测试它的输入文件。它们与我最初在问题中发布的输入文件类似,但不完全相同......

========file.1

common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.1
a second line of something unique from file.1
common line 4 ...
common line 5 ...
something unique from file.1
common line 6 ...
common line 7 ...
penultimate file.1 line
common line 8 ...

========file.2

common line 1 ...
second line from file.2
common line 2 ...
common line 3 ...
something unique from file.2
common line 4 ...
common line 5 ...
something unique from file.2
a second line of something unique from file.2
common line 6 ...
common line 7 ...
common line 8 ...

我像这样运行命令...

filemerge file.1 file.2 >file.merged

这些是结果内容file.merged......

common line 1 ...
<<<<<<<< file=file.2
second line from file.2
======== file=file.1
>>>>>>>>
common line 2 ...
common line 3 ...
<<<<<<<< file=file.1
something unique from file.1
a second line of something unique from file.1
======== file=file.2
something unique from file.2
>>>>>>>>
common line 4 ...
common line 5 ...
<<<<<<<< file=file.1
something unique from file.1
======== file=file.2
something unique from file.2
a second line of something unique from file.2
>>>>>>>>
common line 6 ...
common line 7 ...
<<<<<<<< file=file.1
penultimate file.1 line
======== file=file.2
>>>>>>>>
common line 8 ...

正如我所提到的,这与 的“合并冲突”输出的格式并不完全相同git,但它非常相似,对我来说已经足够接近了。

相关内容