如何抓取大文件中间的随机部分？

Question 1

$ dd if=big_file.bin skip=1750 ibs=1MB count=10 of=big_file.bin.part

您可能需要花一些时间阅读和理解 dd。

Answer

$ dd if=big_file.bin skip=1750 ibs=1MB count=10 of=big_file.bin.part

您可能需要花一些时间阅读和理解 dd。

Question 2

您可以使用尾部，但要指定字节偏移量。

tail -c +$START_BYTE $file | head -c $LENGTH > newfile

这样，tail 可以直接跳到起点（不计算新行），并且一旦 head 匹配正确的长度，它就会停止运行。

Answer

您可以使用尾部，但要指定字节偏移量。

tail -c +$START_BYTE $file | head -c $LENGTH > newfile

这样，tail 可以直接跳到起点（不计算新行），并且一旦 head 匹配正确的长度，它就会停止运行。

Question 3

您只需编写一个小程序来寻找某个随机位置并读取一定数量的行。

Python 中的一个例子（读取一行，但你可以修改它）：

def get_random_line():
    """Return a randomly selected line from a file."""
    import random
    fo = open("/some/file.txt")
    try:
        point = random.randrange(fo.size)
        fo.seek(point)
        c = fo.read(1)
        while c != '\n' and fo.tell() > 0:
            fo.seek(-2, 1)
            c = fo.read(1)
        line = fo.readline().strip()
    finally:
        fo.close()
    return line

Answer

您只需编写一个小程序来寻找某个随机位置并读取一定数量的行。

Python 中的一个例子（读取一行，但你可以修改它）：

def get_random_line():
    """Return a randomly selected line from a file."""
    import random
    fo = open("/some/file.txt")
    try:
        point = random.randrange(fo.size)
        fo.seek(point)
        c = fo.read(1)
        while c != '\n' and fo.tell() > 0:
            fo.seek(-2, 1)
            c = fo.read(1)
        line = fo.readline().strip()
    finally:
        fo.close()
    return line

如何抓取大文件中间的随机部分？

答案1

答案2

答案3

相关内容