如何从二进制文件中读取以 null 结尾的字符串

Question

方案一：直接变量赋值

如果您担心的只是空字节，那么您应该能够使用您喜欢的标准方法直接将文件中的数据读取到变量中，即您应该能够忽略空字节并读取数据从文件中。这是使用cat命令和命令替换的示例：

$ data="$(cat eeprom)"
$ echo "${data}"
MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D

这在 BusyBox Docker 容器中对我有用。

解决方案 2：使用`xxd`and`for`循环

如果您想要更多的控制，可以使用xxd将字节转换为十六进制字符串并迭代这些字符串。然后，在迭代这些字符串时，您可以应用您想要的任何逻辑，例如，您可以显式跳过初始空值并打印其余数据，直到达到某些中断条件。

下面的脚本指定有效字符（ASCII 32 到 127）的“白名单”，将其他字符的任何子序列视为分隔符，并提取所有有效子字符串：

#!/bin/sh
# get_hex_substrings.sh

# Get the path to the data-file as a command-line argument
datafile="$1"

# Keep track of state using environment variables
inside_padding_block="true"
inside_bad_block="false"

# NOTE: The '-p' flag is for "plain" output (no additional formatting)
# and the '-c 1' option specifies that the representation of each byte
# will be printed on a separate line
for h in $(xxd -p -c 1 "${datafile}"); do

    # Convert the hex character to standard decimal
    d="$((0x${h}))"

    # Case where we're still inside the initial padding block
    if [ "${inside_padding_block}" == "true" ]; then
        if [ "${d}" -ge 32 ] && [ "${d}" -le 127 ]; then
            inside_padding_block="false";
            printf '\x'"${h}";
        fi

    # Case where we're passed the initial padding, but inside another
    # block of non-printable characters
    elif [ "${inside_bad_block}" == "true" ]; then
        if [ "${d}" -ge 32 ] && [ "${d}" -le 127 ]; then
            inside_bad_block="false";
            printf '\x'"${h}";
        fi

    # Case where we're inside of a substring that we want to extract
    else
        if [ "${d}" -ge 32 ] && [ "${d}" -le 127 ]; then
            printf '\x'"${h}";
        else
            inside_bad_block="true";
            echo
        fi
    fi
done

if [ "${inside_bad_block}" == "false" ]; then
    echo
fi

现在我们可以通过创建一个示例文件来测试这一点，该文件具有分隔子字符串的\x00和\xff子序列：

printf '\x00\x00\x00string1\xff\xff\xffstring2\x00\x00\x00string3\x00\x00\x00' > data.hex

这是运行脚本时得到的输出：

$ sh get_hex_substrings.sh data.hex
string1
string2
string3

解决方案 3：使用`tr`和`cut`命令

您还可以尝试使用tr和cut命令来处理空字节。以下是通过挤压/折叠相邻空字符并将其转换为换行符从空终止字符串列表中提取第一个空终止字符串的示例：

$ printf '\000\000\000string1\000\000\000string2\000\000\000string3\000\000\000' > file.dat
$ tr -s '\000' '\n' < file.dat | cut -d$'\n' -f2
string1

Answer 1

方案一：直接变量赋值

如果您担心的只是空字节，那么您应该能够使用您喜欢的标准方法直接将文件中的数据读取到变量中，即您应该能够忽略空字节并读取数据从文件中。这是使用cat命令和命令替换的示例：

$ data="$(cat eeprom)"
$ echo "${data}"
MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D

这在 BusyBox Docker 容器中对我有用。

解决方案 2：使用`xxd`and`for`循环

如果您想要更多的控制，可以使用xxd将字节转换为十六进制字符串并迭代这些字符串。然后，在迭代这些字符串时，您可以应用您想要的任何逻辑，例如，您可以显式跳过初始空值并打印其余数据，直到达到某些中断条件。

下面的脚本指定有效字符（ASCII 32 到 127）的“白名单”，将其他字符的任何子序列视为分隔符，并提取所有有效子字符串：

#!/bin/sh
# get_hex_substrings.sh

# Get the path to the data-file as a command-line argument
datafile="$1"

# Keep track of state using environment variables
inside_padding_block="true"
inside_bad_block="false"

# NOTE: The '-p' flag is for "plain" output (no additional formatting)
# and the '-c 1' option specifies that the representation of each byte
# will be printed on a separate line
for h in $(xxd -p -c 1 "${datafile}"); do

    # Convert the hex character to standard decimal
    d="$((0x${h}))"

    # Case where we're still inside the initial padding block
    if [ "${inside_padding_block}" == "true" ]; then
        if [ "${d}" -ge 32 ] && [ "${d}" -le 127 ]; then
            inside_padding_block="false";
            printf '\x'"${h}";
        fi

    # Case where we're passed the initial padding, but inside another
    # block of non-printable characters
    elif [ "${inside_bad_block}" == "true" ]; then
        if [ "${d}" -ge 32 ] && [ "${d}" -le 127 ]; then
            inside_bad_block="false";
            printf '\x'"${h}";
        fi

    # Case where we're inside of a substring that we want to extract
    else
        if [ "${d}" -ge 32 ] && [ "${d}" -le 127 ]; then
            printf '\x'"${h}";
        else
            inside_bad_block="true";
            echo
        fi
    fi
done

if [ "${inside_bad_block}" == "false" ]; then
    echo
fi

现在我们可以通过创建一个示例文件来测试这一点，该文件具有分隔子字符串的\x00和\xff子序列：

printf '\x00\x00\x00string1\xff\xff\xffstring2\x00\x00\x00string3\x00\x00\x00' > data.hex

这是运行脚本时得到的输出：

$ sh get_hex_substrings.sh data.hex
string1
string2
string3

解决方案 3：使用`tr`和`cut`命令

您还可以尝试使用tr和cut命令来处理空字节。以下是通过挤压/折叠相邻空字符并将其转换为换行符从空终止字符串列表中提取第一个空终止字符串的示例：

$ printf '\000\000\000string1\000\000\000string2\000\000\000string3\000\000\000' > file.dat
$ tr -s '\000' '\n' < file.dat | cut -d$'\n' -f2
string1

如何从二进制文件中读取以 null 结尾的字符串

答案1

方案一：直接变量赋值

解决方案 2：使用`xxd`and`for`循环

解决方案 3：使用`tr`和`cut`命令

相关内容

答案1

方案一：直接变量赋值

解决方案 2：使用xxdandfor循环

解决方案 3：使用tr和cut命令

相关内容

解决方案 2：使用`xxd`and`for`循环

解决方案 3：使用`tr`和`cut`命令