仔细检查文本文件中所有可见或不可见的字符

Question 1

一个好的十六进制编辑器可能是你最好的选择。试试 FrHed (http://frhed.sourceforge.net/en/）如果你使用的是 Windows 或 Bless（http://home.gna.org/bless/) 在 Linux 上。

Answer

一个好的十六进制编辑器可能是你最好的选择。试试 FrHed (http://frhed.sourceforge.net/en/）如果你使用的是 Windows 或 Bless（http://home.gna.org/bless/) 在 Linux 上。

Question 2

这BabelPad编辑器很棒：当你将光标放在字符后面时，它会显示 Unicode 编号和 Unicode 名称。它有一个内置的 Unicode 信息查看器，可以显示字符的许多 Unicode 属性。不幸的是，它处理 BOM 而不是显示它，它还会解释换行符而不是显示它们。可能有办法改变这种情况；它的文档是……嗯，不是最好的部分。但它会显示像 LRM 这样的不可见控件，并且可以区分空格和不间断空格等。

Answer

这BabelPad编辑器很棒：当你将光标放在字符后面时，它会显示 Unicode 编号和 Unicode 名称。它有一个内置的 Unicode 信息查看器，可以显示字符的许多 Unicode 属性。不幸的是，它处理 BOM 而不是显示它，它还会解释换行符而不是显示它们。可能有办法改变这种情况；它的文档是……嗯，不是最好的部分。但它会显示像 LRM 这样的不可见控件，并且可以区分空格和不间断空格等。

Question 3

也许这很有帮助，尽管答案更适合 Stack Overflow。我用 Perl 构建了一个小型解析器，它可以完成您想要的操作。可惜这里没有突出显示。

#!/usr/bin/perl
use strict; use warnings;
use feature qw(say);
use Data::Dumper;
use Unicode::String;
use utf8;

my $line_no = 1;
# Read stuff from the __DATA__ section as if it were a file,
# one line at a time
while (my $line = <DATA>) {
  # Create a Unicode::String object
  my $us = Unicode::String->new($line);

  # Iterate over the length of the string
  for (my $i = 0; $i < $us->length; $i++) {
    # Get the next char
    my $char = $us->substr($i, 1);
    # Output a description, one line per character
    printf "Line %i, column %i, 0x%x '%s' (%s)\n",
      $line_no,         # line number
      $i,               # colum number
      $char->ord,       # the ordinal of the char, in hex
      $char->as_string, # the stringified char (as in the input)
      $char->name;      # the glyph's name
  }
  # increment line number
  $line_no++;
}

# Below is the DATA section, which can be used as a file handle
__DATA__
This is some very strange unicode stuff right here:
٩(-̮̮̃-̃)۶ ٩(●̮̮̃•̃)۶ ٩(͡๏̯͡๏)۶ ٩(-̮̮̃•̃).

让我们看看它做了什么：

从文件句柄（该DATA部分可以像那样使用）逐行读取。
创建一个表示该行的 Unicode 字符串的对象。
迭代该字符串中的字符
输出每个角色的姓名、号码和相关信息

这确实非常简单。也许你可以将它改编为 php，尽管我不知道是否有一个方便的名称库。

希望能帮助到你。

我在这里举起了笑脸：像 ٩(•̮̮̃•̃)۶ 这样的表情符号由哪些 Unicode 字符组成？

Answer