我想git
使用类似于git
描述“合并冲突”的语义来合并两个非基于的文本文件。
例如,假设我有两个内容相似但不相同的文本文件,名为file.1
和file.2
。我想将这两个文件合并到第三个文件中,如下所示:
hypothetical-merge-utility file.1 file.2 file.merged
我希望它生成file.merged
,它将以类似于以下的方式列出文件内容和每个差异:
common line 1 ...
common line 2 ...
common line 3 ...
<<<<<<< file.1
something unique from file.1
a second line of something unique from file.1
======= file.2
something unique from file.2
>>>>>>> end of diff
common line 4 ...
common line 5 ...
<<<<<<< file.1
something unique from file.1
======= file.2
something unique from file.2
a second line of something unique from file.2
>>>>>>> end of diff
common line 6 ...
common line 7 ...
... etc. ...
换句话说,我希望file.1
和之间的每个差异file.2
看起来类似于“合并冲突”的表示git
。
我不在乎是否使用除<<<<<<<<
, ========
, 和之外的分隔符。>>>>>>>>
我知道 Linux 下有许多可用于合并文本文件的实用程序。然而,我是仅有的git
寻找以类似于描述“合并冲突”的方式专门呈现合并数据的内容。
有谁知道这样的实用程序吗?
先感谢您。
更新:根据 Ed Morton 提出的以下问题,以下是两个测试文件的内容......
==== 文件.1 ====
common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.1
a second line of something unique from file.1
common line 4 ...
common line 5 ...
something unique from file.1
common line 6 ...
common line 7 ...
==== 文件.2 ====
common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.2
common line 4 ...
common line 5 ...
something unique from file.2
a second line of something unique from file.2
common line 6 ...
common line 7 ...
答案1
笔记:虽然我认为这是一个有点合理的“答案”,但我现在提出了另一个我认为更好的“答案”。所以请参阅下面我的另一个“答案”。
这个“答案”的原始版本......
哦!我在这里发帖太早了。我不知道 的-D
命令行选项diff
,现在我意识到我可以做到这一点......
diff -D file.1 file.2 >file.merged
它将产生以下内容file.merged
...
common line 1 ...
common line 2 ...
common line 3 ...
#ifdef file.1
something unique from file.1
a second line of something unique from file.1
#else /* file.1 */
something unique from file.2
#endif /* file.1 */
common line 4 ...
common line 5 ...
#ifdef file.1
something unique from file.1
#else /* file.1 */
something unique from file.2
a second line of something unique from file.2
#endif /* file.1 */
common line 6 ...
common line 7 ...
... etc. ...
我可以像处理#ifdef
、#else
和#endif
行一样处理git
、<<<<<<<<
和========
行>>>>>>>>
。
更新:...我刚刚发现了这个: https://stackoverflow.com/questions/16902001/manually-merge-two-files-using-diff
它展示了我如何也可以使用统一差异格式做类似的事情。给出diff
一个-U
带有巨大参数的选项,该参数大于file.1
和中的最大行数file.2
。例如 ...
diff -U 99999999 file.1 file.2 | tail -n +4 >file.merged
然后它会产生这样的结果:
common line 1 ...
common line 2 ...
common line 3 ...
+something unique from file.2
-something unique from file.1
-a second line of something unique from file.1
common line 4 ...
common line 5 ...
+something unique from file.2
+a second line of something unique from file.2
-something unique from file.1
common line 6 ...
common line 7 ...
... etc. ...
线条+
代表 中的唯一数据file.2
,-
线条代表 中的唯一数据file.1
。
我可以处理这些+
和-
台词。
答案2
听起来您并不真正关心输出格式,而只是想知道如何识别哪些行来自每个文件或哪些行是常见的。给这个,怎么样:
$ diff --old-line-format=$'-%l\n' --new-line-format=$'+%l\n' --unchanged-line-format=$'=%l\n' file.1 file.2
=common line 1 ...
=common line 2 ...
=common line 3 ...
-something unique from file.1
-a second line of something unique from file.1
+something unique from file.2
=common line 4 ...
=common line 5 ...
-something unique from file.1
+something unique from file.2
+a second line of something unique from file.2
=common line 6 ...
=common line 7 ...
警惕任何必须测试行内容以获取该行源指示符的解决方案(例如,如果您正在寻找<<<<<<< file.1
告诉您什么是唯一的file1
- 如果file
包含与该字符串完全相同的行怎么办? ) 而不是始终且仅出现在每行中唯一位置的指示符,因为如果该字符串可能出现在您的输入中,则对任何字符串的测试都会失败。对于上述内容,第一个字符始终指示该行的来源,因此它不会与可能的文件内容发生冲突。如果你真的想准确地获得 git merge 冲突的输出格式(我不推荐),你总是可以将上面的内容通过管道传输到一个简单的 awk 脚本来打印<<< file
或当该行的第一个字符发生变化时你喜欢的任何内容,然后删除那个角色。
答案3
由于我最初在第一个“答案”中发布的解决方案的局限性,其中涉及diff -D ...
和diff -U ...
,我决定使用 python 的difflib
模块在 python 中编写一个解决方案。
我编写它是为了产生看起来与git
.它使用包含字符串<<<<<<<<
、========
、 和 的分隔符>>>>>>>>
,正如我们所知,如果原始文本包含这样的字符串,这可能会导致歧义。然而,同样的含糊不清的问题可能存在于 的“合并冲突”输出中git
,但由于我对此感到满意git
并愿意接受它,所以我也对自己的解决方案中的这些含糊不清感到满意。
输出与“合并冲突”输出并不完全相同git
,但它足以满足我的愿望。
首先,这是Python程序(我清理了我在这里发布的原始Python代码,这是清理后的版本)。我把这个程序称为filemerge
...
#!/usr/bin/python3
### Take the diff's between two files and output
### the common and different lines in a manner
### which is very similar to the way that `git`
### depicts merge conflicts.
import sys
sys.dont_write_bytecode = True
import os
from difflib import unified_diff
prog = None
diff_start = '<<<<<<<<'
diff_sep = '========'
diff_end = '>>>>>>>>'
def main():
if len(sys.argv) < 3:
print(f'\nusage: {prog} file1 file2\n')
return 1
file1, file2 = sys.argv[1:3]
data1 = None
data2 = None
missing = []
try:
with open(file1, 'r') as f:
data1 = f.readlines()
except Exception:
missing.append(file1)
try:
with open(file2, 'r') as f:
data2 = f.readlines()
except Exception:
missing.append(file2)
if missing:
print(f'\nnot found: {", ".join(missing)}\n')
return 1
n1 = len(data1)
n2 = len(data2)
max_lines = (n1 + 1) if n1 > n2 else (n2 + 1)
count = 0
state = ''
sep_printed = False
next_file = ''
for line in unified_diff(data1, data2, n=max_lines):
count += 1
if count < 4:
continue
# Every line which is returned by unified_diff()
# is at least 2 characters long. Each of these
# lines starts with either ' ', '+', or '-', and
# each of these lines ends with a newline.
line = line[:-1]
ch0 = line[0]
if ch0 == ' ':
if state:
state = ''
if not sep_printed:
print(f'{diff_sep}{next_file}')
print(diff_end)
sep_printed = False
next_file = ''
elif ch0 == '-':
if state == ch0:
pass
elif state == '+':
print(f'{diff_sep} file={file1}')
sep_printed = True
next_file = ''
else:
print(f'{diff_start} file={file1}')
sep_printed = False
next_file = f' file={file2}'
state = ch0
elif ch0 == '+':
if state == ch0:
pass
elif state == '-':
print(f'{diff_sep} file={file2}')
sep_printed = True
next_file = ''
else:
print(f'{diff_start} file={file2}')
sep_printed = False
next_file = f' file={file1}'
state = ch0
print(line[1:])
if state:
if not sep_printed:
print(f'{diff_sep}{next_file}')
next_file = ''
print(diff_end)
return 0
if __name__ == '__main__':
prog = os.path.basename(sys.argv[0])
sys.exit(main())
这是我测试它的输入文件。它们与我最初在问题中发布的输入文件类似,但不完全相同......
========file.1
common line 1 ...
common line 2 ...
common line 3 ...
something unique from file.1
a second line of something unique from file.1
common line 4 ...
common line 5 ...
something unique from file.1
common line 6 ...
common line 7 ...
penultimate file.1 line
common line 8 ...
========file.2
common line 1 ...
second line from file.2
common line 2 ...
common line 3 ...
something unique from file.2
common line 4 ...
common line 5 ...
something unique from file.2
a second line of something unique from file.2
common line 6 ...
common line 7 ...
common line 8 ...
我像这样运行命令...
filemerge file.1 file.2 >file.merged
这些是结果内容file.merged
......
common line 1 ...
<<<<<<<< file=file.2
second line from file.2
======== file=file.1
>>>>>>>>
common line 2 ...
common line 3 ...
<<<<<<<< file=file.1
something unique from file.1
a second line of something unique from file.1
======== file=file.2
something unique from file.2
>>>>>>>>
common line 4 ...
common line 5 ...
<<<<<<<< file=file.1
something unique from file.1
======== file=file.2
something unique from file.2
a second line of something unique from file.2
>>>>>>>>
common line 6 ...
common line 7 ...
<<<<<<<< file=file.1
penultimate file.1 line
======== file=file.2
>>>>>>>>
common line 8 ...
正如我所提到的,这与 的“合并冲突”输出的格式并不完全相同git
,但它非常相似,对我来说已经足够接近了。