wc 命令计算额外字符

wc 命令计算额外字符
cat > file
Amy looked at her watch. He was late. The sun was setting but Jake didn’t care.

wc file
1      16      82 file

有人可以解释为什么wc命令在这种情况下返回 3 个额外字符吗?

答案1

wc多显示 3 个字符,因为您的示例文件包含一个奇特的 Unicode 撇号(很可能是因为您从浏览器或文本编辑器复制了内容):

$ cat file
Amy looked at her watch. He was late. The sun was setting but Jake didn’t care.
$ wc file
1      16      82 file

使用纯 ASCII 撇号'

$ cat file2
Amy looked at her watch. He was late. The sun was setting but Jake didn't care.
$ wc file
1      16      80 file2

wc默认情况下显示每个字节数手动的:

每个文件的换行符、单词和字节数

对于字符计数,-m可以使用参数:

$ cat file
Amy looked at her watch. He was late. The sun was setting but Jake didn’t care.
$ wc -m file
      80 file.txt

答案2

通过管道传输文件xxd以查看与 ascii 并排的十六进制输出,这将让您查看是否有您看不到或无法打印的额外字符。

$ cat file
one‏ and ‏two

$ cat file | wc
      1       3      18

$ cat file | xxd
00000000: 6f6e 65e2 808f 2061 6e64 20e2 808f 7477  one... and ...tw
00000010: 6f0a                                     o.

答案3

wc计算字节数,而不是字符数。如果你想计算字符数,你应该使用-m选项:

cat > file
Amy looked at her watch. He was late. The sun was setting but Jake didn’t care.

wc -l -w -m file
1      16      80 file

剩下的“额外字符”确实是文件末尾的换行符。

相关内容