Powershell 中的 GnuWin32 / sed 意外行为

Question

这是 Unicode。sed 输出的内容是 Unicode，没有 PowerShell 用来区分 Unicode 和 ASCII 的 2 字节前缀。因此，PowerShell 认为它是 ASCII，并保留 \0 字节（2 字节 Unicode 字符的高位字节），这些字节显示为空白。由于 PowerShell 内部处理 Unicode，因此它实际上将每个原始字节扩展为 2 字节 Unicode 字符。没有办法强制 PowerShell 接受 Unicode。可能的解决方法是：

Unicode 是否作为 SED 的输入？不太可能，但我认为有可能。检查一下。

使 SED 的输出以 Unicode 指示符 \uFEFF 开头。这可能是 SED 源代码中遗漏的内容：

_setmode(_fileno(stdout), _O_WTEXT); // probably present and makes it send Unicode
wprintf(L"\uFEFF"); // probably missing

您可以在 SED 命令中添加代码，例如

sed "1s/^/\xFF\xFE/;..." # won't work if SED produces Unicode but would work it SED passes Unicode through from its input
sed "1s/^/\uFEFF/;..." # use if SED produces Unicode itself, hopefully SED supports \u

将 sed 的输出写入文件，然后使用 Get-Content -Encoding Unicode 读取。请注意，必须在 cmd.exe 内的命令中切换到文件，例如：
```
cmd /c "sed ... >file"
```
如果您只是让 >file 在 PowerShell 中处理，它就会以同样的方式变得混乱。
从 PowerShell 中的结果文本中删除 \0 字符。这对于创建包含代码 0xA 或 0xD 的 Unicode 字节的国际字符不起作用 - 您最终会得到行分割而不是它们。

Answer 1

这是 Unicode。sed 输出的内容是 Unicode，没有 PowerShell 用来区分 Unicode 和 ASCII 的 2 字节前缀。因此，PowerShell 认为它是 ASCII，并保留 \0 字节（2 字节 Unicode 字符的高位字节），这些字节显示为空白。由于 PowerShell 内部处理 Unicode，因此它实际上将每个原始字节扩展为 2 字节 Unicode 字符。没有办法强制 PowerShell 接受 Unicode。可能的解决方法是：

Unicode 是否作为 SED 的输入？不太可能，但我认为有可能。检查一下。

使 SED 的输出以 Unicode 指示符 \uFEFF 开头。这可能是 SED 源代码中遗漏的内容：

_setmode(_fileno(stdout), _O_WTEXT); // probably present and makes it send Unicode
wprintf(L"\uFEFF"); // probably missing

您可以在 SED 命令中添加代码，例如

sed "1s/^/\xFF\xFE/;..." # won't work if SED produces Unicode but would work it SED passes Unicode through from its input
sed "1s/^/\uFEFF/;..." # use if SED produces Unicode itself, hopefully SED supports \u

将 sed 的输出写入文件，然后使用 Get-Content -Encoding Unicode 读取。请注意，必须在 cmd.exe 内的命令中切换到文件，例如：
```
cmd /c "sed ... >file"
```
如果您只是让 >file 在 PowerShell 中处理，它就会以同样的方式变得混乱。
从 PowerShell 中的结果文本中删除 \0 字符。这对于创建包含代码 0xA 或 0xD 的 Unicode 字节的国际字符不起作用 - 您最终会得到行分割而不是它们。

Powershell 中的 GnuWin32 / sed 意外行为

答案1

相关内容