如何测试字符串中是否包含 ASCII 空白字符？

Question 1

在 POSIX sh 语法中：

case $string in
  (*[[:blank:]]*) echo "string contains at least one character classified as blank";;
  (*[[:space:]]*) echo "string contains at least one character classified as whitespace (but not blank)";;
  (*) echo no character classified as whitespace;;
esac

[:blank:]必须是的子集[:space:]。[:blank:]保证至少包含空格和 TAB 以及[:space:]至少包含空格、TAB、NL、CR、FF 和 VT。

这是根据所使用的编码和区域设置中的字符分类而定的。在大多数系统上，所有语言环境都使用 ASCII 或 ASCII 超集的字符集（如果我们忽略在某些日语语言环境中的某些 BSD 上发现的 MS-Kanji，其中 0x5c 代替¥（\并且没有\字符！），但否则是 ASCII 的超集对于其余的）。

如果您想检查$string即使在基于 EBCDIC 的系统上也至少包含一个 ASCII 编码的 ASCII 空白，您需要指定字节值集或使用iconv将主题从当前字符集转换为 ASCII：

ascii_whitespace=$(printf ' \r\n\r\f\v' | iconv -t ASCII)
# or
ascii_whitespace=$(printf '\40\11\12\13\14\15')
case $string in
  (["$ascii_whitespace"]) echo contains at least one ASCII whitespace;;
esac

（希望这\15不会碰巧成为该系统上的换行符）。

Answer

在 POSIX sh 语法中：

case $string in
  (*[[:blank:]]*) echo "string contains at least one character classified as blank";;
  (*[[:space:]]*) echo "string contains at least one character classified as whitespace (but not blank)";;
  (*) echo no character classified as whitespace;;
esac

[:blank:]必须是的子集[:space:]。[:blank:]保证至少包含空格和 TAB 以及[:space:]至少包含空格、TAB、NL、CR、FF 和 VT。

这是根据所使用的编码和区域设置中的字符分类而定的。在大多数系统上，所有语言环境都使用 ASCII 或 ASCII 超集的字符集（如果我们忽略在某些日语语言环境中的某些 BSD 上发现的 MS-Kanji，其中 0x5c 代替¥（\并且没有\字符！），但否则是 ASCII 的超集对于其余的）。

如果您想检查$string即使在基于 EBCDIC 的系统上也至少包含一个 ASCII 编码的 ASCII 空白，您需要指定字节值集或使用iconv将主题从当前字符集转换为 ASCII：

ascii_whitespace=$(printf ' \r\n\r\f\v' | iconv -t ASCII)
# or
ascii_whitespace=$(printf '\40\11\12\13\14\15')
case $string in
  (["$ascii_whitespace"]) echo contains at least one ASCII whitespace;;
esac

（希望这\15不会碰巧成为该系统上的换行符）。

Question 2

假设您的字符串存储在 shell 变量中$string。在这种情况下，由于您已指定bash为 shell，因此您可以在[[ ... ]]测试构造中使用内置正则表达式匹配：

if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi

同样可以在 shell 脚本中使用。

一些用法示例：

~$ string=" hello "
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Contains whitespace

~$ string=$'\thello'
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Contains whitespace

~$ string="hello"
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Doesn't contain whitespace

笔记：这使用 POSIX 字符类[:space:]。参见例如

[:space:]对于和之间的微妙之处[:blank:]。如果您只想考虑创建空格的字符在同一行内（即<space>和\t），您应该切换到[:blank:]（但请注意，在某些语言环境中，[:blank:]还将包含垂直空格字符）。

Answer

假设您的字符串存储在 shell 变量中$string。在这种情况下，由于您已指定bash为 shell，因此您可以在[[ ... ]]测试构造中使用内置正则表达式匹配：

if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi

同样可以在 shell 脚本中使用。

一些用法示例：

~$ string=" hello "
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Contains whitespace

~$ string=$'\thello'
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Contains whitespace

~$ string="hello"
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Doesn't contain whitespace

笔记：这使用 POSIX 字符类[:space:]。参见例如

[:space:]对于和之间的微妙之处[:blank:]。如果您只想考虑创建空格的字符在同一行内（即<space>和\t），您应该切换到[:blank:]（但请注意，在某些语言环境中，[:blank:]还将包含垂直空格字符）。

Question 3

使用乐（以前称为 Perl6）

~$ echo "abc " | raku -ne '.contains(/ \s /).say'
True
~$ echo "abc" | raku -ne '.contains(/ \s /).say'
False

-ne上面的 Raku 代码使用类似 awk 的命令行标志在输入上逐行运行。 Raku 的contains方法返回一个布尔值。前导.点contains指示从命令行中取出输入，或者（或者）标准输入。

~$ echo "abc " | raku -ne 'say .contains(/ \s /) ?? True !! False;'
    True
~$ echo "abc" | raku -ne 'say .contains(/ \s /) ?? True !! False;'
    False

上面稍微复杂一些，因为它使用了 Raku 的三元运算符： Test ??True !!False 。 Raku有逻辑性True和逻辑性False，所以不需要引用上面的返回值。这里的优点是您可以简单地将Trueand替换False为您选择的双引号返回，例如"Yes"and "No"。

想必OP的问题涉及水平的空白，在这方面，Raku 可以区分\h水平空白和\v垂直空白：

~$ raku -e 'put "abc\t";' | raku -ne 'say .contains(/ \h /);'
True
~$ raku -e 'put "abc\t";' | raku -ne 'say .contains(/ \v /);'
False

OP 没有说明是否必须处理多行输入字符串。它们对于空白总是“正”，但对于水平空白可能是“负”。 [将一列数字视为输入]。无论如何，在 Raku 中，您可以像上面那样按行读取输入（默认情况下会自动执行），或者\n使用名称奇怪但令人难忘的一次性读取输入（保留 eol换行符） slurp。

逐行读取（autochomps）：

~$ raku -e 'put "1\n2\n3";' | raku -ne 'say .contains(/ \h /);'
False
False
False
~$ raku -e 'put "1\n2\n3";' | raku -e 'for lines() {say .contains(/ \h /)};'
False
False
False

一次读取全部内容（无自动咀嚼）：

~$ raku -e 'put "1\n2\n3";' | raku -e 'say slurp.contains(/ \v /);'
True
~$ raku -e 'put "1\n2\n3";' | raku -e 'put slurp.contains(/ \h /);'
False

附录：我正在解释OP的声明，“我不必担心 ASCII 之外的事情......”作为“我不在乎 Unicode 是否被处理”。如果仅有的ASCII 空白将被处理（并且所有其他空白都会被拒绝），这是 Raku 可以管理的，但上面没有解决。请注意，Raku 已支持 Unicode，因此\s( 的缩写<space>) 和\h( 的缩写<blank>) 以及\v都默认接受 Unicode。

如果您想拒绝非 ASCII（水平）空白，您可以尝试类似以下定制字符类的操作：<:ASCII> & <blank>。

例子：

~$ raku -e 'put "\xA0";' | raku -ne 'put .contains(/ <blank> / );'
True
~$ raku -e 'put "\xA0";' | raku -ne 'put .contains(/ <:ASCII> & <blank> / );'
False

https://docs.raku.org/language/operators#index-entry-operator_ternary
https://docs.raku.org/language/regexes#\h_and_\H
https://docs.raku.org/routine/contains
https://raku.org

Answer