跳过文件的前 3 个字节

Question 1

老派——你可以使用dd：

dd if=A_FILE bs=1 skip=3

输入文件为A_FILE，块大小为 1 个字符（字节），跳过前 3 个“块”（字节）。（对于某些变体，dd例如 GNU dd，您可以bs=1c在此处使用 - 以及替代方案，例如bs=1k在其他情况下以 1 KB 的块读取。AIX 上dd似乎不支持此功能；BSD (macOS Sierra) 变体不支持c但确实支持k、m、g等）

还有其他方法可以达到相同的结果：

sed '1s/^...//' A_FILE

如果第一行有 3 个或更多字符，则此方法有效。

tail -c +4 A_FILE

您也可以使用 Perl、Python 等。

Answer

老派——你可以使用dd：

dd if=A_FILE bs=1 skip=3

输入文件为A_FILE，块大小为 1 个字符（字节），跳过前 3 个“块”（字节）。（对于某些变体，dd例如 GNU dd，您可以bs=1c在此处使用 - 以及替代方案，例如bs=1k在其他情况下以 1 KB 的块读取。AIX 上dd似乎不支持此功能；BSD (macOS Sierra) 变体不支持c但确实支持k、m、g等）

还有其他方法可以达到相同的结果：

sed '1s/^...//' A_FILE

如果第一行有 3 个或更多字符，则此方法有效。

tail -c +4 A_FILE

您也可以使用 Perl、Python 等。

Question 2

cat您可以tail这样使用而不是使用：

tail -c +4 FILE

这将打印出除前 3 个字节之外的整个文件。咨询man tail更多信息。

Answer

cat您可以tail这样使用而不是使用：

tail -c +4 FILE

这将打印出除前 3 个字节之外的整个文件。咨询man tail更多信息。

Question 3

如果系统上有 Python，则可以使用小型 Python 脚本来利用seek()函数从第 n 个字节开始读取，如下所示：

#!/usr/bin/env python3
import sys
with open(sys.argv[1],'rb') as fd:
    fd.seek(int(sys.argv[2]))
    for line in fd:
        print(line.decode().strip())

用法如下：

$ ./skip_bytes.py input.txt 3

请注意，字节计数从 0 开始（因此第一个字节实际上是索引 0），因此通过指定 3，我们有效地将读取定位为从 3+1=第 4 个字节开始

Answer

如果系统上有 Python，则可以使用小型 Python 脚本来利用seek()函数从第 n 个字节开始读取，如下所示：

#!/usr/bin/env python3
import sys
with open(sys.argv[1],'rb') as fd:
    fd.seek(int(sys.argv[2]))
    for line in fd:
        print(line.decode().strip())

用法如下：

$ ./skip_bytes.py input.txt 3

请注意，字节计数从 0 开始（因此第一个字节实际上是索引 0），因此通过指定 3，我们有效地将读取定位为从 3+1=第 4 个字节开始

Question 4

我最近需要做类似的事情。我正在帮助解决现场支持问题，需要让技术人员在进行更改时看到实时绘图。数据位于全天增长的二进制日志中。我有可以解析和绘制日志数据的软件，但它目前不是实时的。我所做的是在开始处理数据之前捕获日志的大小，然后进入一个处理数据的循环，并且每次传递都会创建一个新文件，其中包含尚未处理的文件的字节。

#!/usr/bin/env bash

# I named this little script hackjob.sh
# The purpose of this is to process an input file and load the results into
# a database. The file is constantly being update, so this runs in a loop
# and every pass it creates a new temp file with bytes that have not yet been
# processed.  It runs about 15 seconds behind real time so it's
# pseudo real time.  This will eventually be replaced by a real time
# queue based version, but this does work and surprisingly well actually.

set -x

# Current data in YYYYMMDD fomat
DATE=`date +%Y%m%d`

INPUT_PATH=/path/to/my/data
IFILE1=${INPUT_PATH}/${DATE}_my_input_file.dat

OUTPUT_PATH=/tmp
OFILE1=${OUTPUT_PATH}/${DATE}_my_input_file.dat

# Capture the size of the original file
SIZE1=`ls -l ${IFILE1} | awk '{print $5}'`

# Copy the original file to /tmp
cp ${IFILE1} ${OFILE1}

while :
do
    sleep 5

    # process_my_data.py ${OFILE1}
    rm ${OFILE1}
    # Copy IFILE1 to OFILE1 minus skipping the amount of data already processed
    dd skip=${SIZE1} bs=1 if=${IFILE1} of=${OFILE1}
    # Update the size of the input file
    SIZE1=`ls -l ${IFILE1} | awk '{print $5}'`

    echo

    DATE=`date +%Y%m%d`

done

Answer

我最近需要做类似的事情。我正在帮助解决现场支持问题，需要让技术人员在进行更改时看到实时绘图。数据位于全天增长的二进制日志中。我有可以解析和绘制日志数据的软件，但它目前不是实时的。我所做的是在开始处理数据之前捕获日志的大小，然后进入一个处理数据的循环，并且每次传递都会创建一个新文件，其中包含尚未处理的文件的字节。

#!/usr/bin/env bash

# I named this little script hackjob.sh
# The purpose of this is to process an input file and load the results into
# a database. The file is constantly being update, so this runs in a loop
# and every pass it creates a new temp file with bytes that have not yet been
# processed.  It runs about 15 seconds behind real time so it's
# pseudo real time.  This will eventually be replaced by a real time
# queue based version, but this does work and surprisingly well actually.

set -x

# Current data in YYYYMMDD fomat
DATE=`date +%Y%m%d`

INPUT_PATH=/path/to/my/data
IFILE1=${INPUT_PATH}/${DATE}_my_input_file.dat

OUTPUT_PATH=/tmp
OFILE1=${OUTPUT_PATH}/${DATE}_my_input_file.dat

# Capture the size of the original file
SIZE1=`ls -l ${IFILE1} | awk '{print $5}'`

# Copy the original file to /tmp
cp ${IFILE1} ${OFILE1}

while :
do
    sleep 5

    # process_my_data.py ${OFILE1}
    rm ${OFILE1}
    # Copy IFILE1 to OFILE1 minus skipping the amount of data already processed
    dd skip=${SIZE1} bs=1 if=${IFILE1} of=${OFILE1}
    # Update the size of the input file
    SIZE1=`ls -l ${IFILE1} | awk '{print $5}'`

    echo

    DATE=`date +%Y%m%d`

done

跳过文件的前 3 个字节

答案1

答案2

答案3

答案4

相关内容