我想使用模式和“魔术”标头在分区转储(二进制文件)中找到配置部分的重复副本。配置部分始终以 202'0xff'
字节开始,后跟 4 字节'\x00\x00\x23\x27'
。该脚本应识别分区内配置的不同副本,并打印模式开始出现的地址(以字节为单位)。我为我的模式调整了现有的 python 脚本,但它不起作用,只是抛出错误,因为字节与字符串混合。如何修复这个脚本?
#!/usr/bin/env python3
import re
import mmap
import sys
magic = '\xff' * 202
pattern = magic + '\x00\x00\x23\x27'
fh = open(sys.argv[1], "r+b")
mf = mmap.mmap(fh.fileno(), 0)
mf.seek(0)
fh.seek(0)
for occurence in re.finditer(pattern, mf):
print(occurence.start())
mf.close()
fh.close()
错误:
$ ./matcher.py dump.bin
Traceback (most recent call last):
File "/home/eviecomp/BC2UTILS/dump_previous_profile/./matcher.py", line 13, in <module>
for occurence in re.finditer(pattern, mf):
File "/usr/lib/python3.9/re.py", line 248, in finditer
return _compile(pattern, flags).finditer(string)
TypeError: cannot use a string pattern on a bytes-like object
图案和魔法:
答案1
虽然re
可以处理字节字符串(您只需要注意警告消息文本并搜索对象bytes
,而不是str
),但这里似乎有点矫枉过正。
#!/usr/bin/env python3
import mmap
from sys import argv
# NOTE: important to use `b''` literals!
magic = b'\xff' * 202
pattern = magic + b'\x00\x00\x23\x27'
with open(argv[1], "r+b") as fh:
with mmap.mmap(fh.fileno(), 0) as mm:
pos = -1
while -1 != (pos := mm.find(pattern, pos + 1)):
print(pos)
或者,为了现代 Python 的美观,您还可以在匹配上使用“迭代器”:
from mmap import mmap
from typing import Generator
from sys import argv
def positions(mm: mmap, pattern: bytes) -> Generator[int, None, None]:
pos = -1
while -1 != (pos := mm.find(pattern, pos + 1)):
yield pos
pattern = b'\xff' * 202 + b'\x00\x00\x23\x27'
with open(argv[1], "r+b") as lfile:
with mmap(lfile.fileno(), 0) as mapping:
all_positions = ", ".join(f"{pos:#0x}" for pos in positions(mapping, pattern))
print(all_positions)