合并大文件中的非零不同字节

Question

我为你编写了一个 Python 脚本。

#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''

import sys

args = sys.argv[1:]

file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')

def get_bytes(file):
    '''Return a generator that yields each byte in the given file.'''
    def get_byte():
        return file.read(1)
    return iter(get_byte, b'')

for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
    if byte1 == byte2:
        byte_out = byte1
    elif ord(byte1) == 0:
        byte_out = byte2
    elif ord(byte2) == 0:
        byte_out = byte1
    else:
        msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
        raise ValueError(msg.format(i, byte1, byte2))
    file_out.write(byte_out)

使其可执行然后像这样调用它：

$ ./test.py file1.iso file2.iso file3.iso

或者简称：

$ ./test.py file{1,2,3}.iso

附言：我最近一直在研究以不同的方式读取文件，所以这是一个很好的意外发现。

Answer 1

我为你编写了一个 Python 脚本。

#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''

import sys

args = sys.argv[1:]

file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')

def get_bytes(file):
    '''Return a generator that yields each byte in the given file.'''
    def get_byte():
        return file.read(1)
    return iter(get_byte, b'')

for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
    if byte1 == byte2:
        byte_out = byte1
    elif ord(byte1) == 0:
        byte_out = byte2
    elif ord(byte2) == 0:
        byte_out = byte1
    else:
        msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
        raise ValueError(msg.format(i, byte1, byte2))
    file_out.write(byte_out)

使其可执行然后像这样调用它：

$ ./test.py file1.iso file2.iso file3.iso

或者简称：

$ ./test.py file{1,2,3}.iso

附言：我最近一直在研究以不同的方式读取文件，所以这是一个很好的意外发现。

合并大文件中的非零不同字节

答案1

相关内容