使用 bash 的 printf 填充 unicode 字符串

Question 1

该字符ä在 UTF-8 中使用 2 个字节进行编码，因此 Printf 将其视为 2 填充。

Wc 可以计算字符串的字符 ( -m) 和字节 ( ) 。-c则给予 Printf 的数字为[intended pad]+[bytes]-[chars]。所以我组装了这个pad.sh脚本，

#!/bin/sh
bytes=$(printf '%s' "$2" | wc -c)
chars=$(printf '%s' "$2" | wc -m)
n=$(($1+bytes-chars))
printf "%${n}s" "$2"

在下面的示例执行中，为了清晰起见，我在每个输出后人为地添加了换行符。

$ sh pad.sh 10 abcdef
    abcdef
$ sh pad.sh 10 äéßôçÈ
    äéßôçÈ

Answer

该字符ä在 UTF-8 中使用 2 个字节进行编码，因此 Printf 将其视为 2 填充。

Wc 可以计算字符串的字符 ( -m) 和字节 ( ) 。-c则给予 Printf 的数字为[intended pad]+[bytes]-[chars]。所以我组装了这个pad.sh脚本，

#!/bin/sh
bytes=$(printf '%s' "$2" | wc -c)
chars=$(printf '%s' "$2" | wc -m)
n=$(($1+bytes-chars))
printf "%${n}s" "$2"

在下面的示例执行中，为了清晰起见，我在每个输出后人为地添加了换行符。

$ sh pad.sh 10 abcdef
    abcdef
$ sh pad.sh 10 äéßôçÈ
    äéßôçÈ

Question 2

我该如何在 bash 中填充 Unicode 字符串？

这远远超出了 bash 的能力。如果您将“Unicode 字符串”限制为 ascii++（没有双宽字符、没有 bidi、没有非空格标记等），您可以临时配置如下内容：

% pad(){ printf '%*s%s\n' "$(($1-${#2}))" "" "$2"; }
% pad 2 €
 €

Answer

我该如何在 bash 中填充 Unicode 字符串？

这远远超出了 bash 的能力。如果您将“Unicode 字符串”限制为 ascii++（没有双宽字符、没有 bidi、没有非空格标记等），您可以临时配置如下内容：

% pad(){ printf '%*s%s\n' "$(($1-${#2}))" "" "$2"; }
% pad 2 €
 €

Question 3

bash 行为正确并且 C 程序

#include <stdio.h>
main()
{
        char foo[] = "ä";

        printf("%2s\n", foo);
}

行为相同。

这是因为 %s 指的是面向字节的字符串，而 UTF-8 中的“ä”结果为 2 个字节。

据我测试，其他 shell 都没有行为不正确。

您期望的结果可以通过以下方式看到：

printf '%2S\n' ä

但我测试的任何 shell 都不支持这一点。

Answer