当文件是 utf-8 文本时,cli 程序“less”会将其解释为二进制文件

当文件是 utf-8 文本时,cli 程序“less”会将其解释为二进制文件

我有一个文件,里面有非 ASCII 的 UTF-8 字符。当我使用less它查看该文件时,我收到一条警告,说may be a binary file. See it anyway?但该文件显然不是二进制文件。当我打开该文件时,字符无法正确呈现。是什么让 less 相信该文件是二进制文件?另外,请注意,这些文件有更多行纯 ASCII 文本,我为了简洁起见将其删去了。这是一个重现该行为的半最小示例。

更多背景信息:

$ cat broken.log
⋮ =✓)
$ head broken.log
⋮ =✓)
$ less broken.log
"broken.log" may be a binary file.  See it anyway?

<E2><8B><AE>
<E2><8B><AE> =<E2><9C><93>)
broken.log (END)

$ file broken.log
broken.log: UTF-8 Unicode text

操作系统:

$ cat /etc/os-release  
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

LESS:我很确定它的版本是 487-0.1。

环境:

$ env | grep LANG
LANG=en_US.UTF-8
$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ which less
/usr/bin/less
$ ls -la $(which less)
lrwxrwxrwx 1 root root 9 Jul 20 15:49 /usr/bin/less -> /bin/less
$ ls -la /bin/less
-rwxr-xr-x 1 root root 166664 May  7  2018 /bin/less
$ type -a less
less is /usr/bin/less
less is /bin/less

答案1

在这里解决... https://stackoverflow.com/questions/43708896/unable-to-locate-package-language-pack-en

... 和 ...

RUN apt-get install -y locales locales-all
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

相关内容