如何在文本文件中找到非 ASCII 字符？

Question 1

好吧，一个小时后它仍然在这里，所以我还是回答一下吧。这是一个简单的过滤器，它只打印输入中的非 ASCII 字符，如果没有，则给出退出代码 0，如果有，则给出 1。仅从标准输入读取。

#include <stdio.h>
#include <ctype.h>

int main(void)
{
    int c, flag = 0;

    while ((c = getchar()) != EOF)
        if (!isascii(c)) {
            putchar(c);
            flag = 1;
        }

    return flag;
}

Answer

好吧，一个小时后它仍然在这里，所以我还是回答一下吧。这是一个简单的过滤器，它只打印输入中的非 ASCII 字符，如果没有，则给出退出代码 0，如果有，则给出 1。仅从标准输入读取。

#include <stdio.h>
#include <ctype.h>

int main(void)
{
    int c, flag = 0;

    while ((c = getchar()) != EOF)
        if (!isascii(c)) {
            putchar(c);
            flag = 1;
        }

    return flag;
}

Question 2

只需在文本文件上运行 $JDK_HOME/bin/native2ascii 并在输出文件中搜索“\u”。我假设您想找到它，这样您就可以转义它，这将为您节省一步。;)

Answer

只需在文本文件上运行 $JDK_HOME/bin/native2ascii 并在输出文件中搜索“\u”。我假设您想找到它，这样您就可以转义它，这将为您节省一步。;)

Question 3

我不知道这是否合法，将每个字符转换为 int 并使用 catch 来识别失败的内容。我也懒得用 Java 编写这个，所以用 Groovy

def chars = ['Ã', 'a', 'Â', 'ç', 'x', 'o', 'Ð'];

chars.each{
    try{ def asciiInt = (int) it }
    catch(Exception e){ print it + " "}
}

==> Ã Â ç Ð

Answer

我不知道这是否合法，将每个字符转换为 int 并使用 catch 来识别失败的内容。我也懒得用 Java 编写这个，所以用 Groovy

def chars = ['Ã', 'a', 'Â', 'ç', 'x', 'o', 'Ð'];

chars.each{
    try{ def asciiInt = (int) it }
    catch(Exception e){ print it + " "}
}

==> Ã Â ç Ð

Question 4

一个简单的 Groovy 示例：

def str = [ "this doesn't have any unicode", "this one does ±ÁÎ˜Â·€ÔÅ" ]

str.each {
    if( it ==~ /[\x00-\x7F]*/ ) {
        println "all ascii: $it"
    } else {
        println "NOT ASCII: $it"
    }
}

就这么简单：it ==~ /[\x00-\x7F]*/

编辑：我忘了包含文件版本。哎呀：

def text = new File(args[0]).text
if( text ==~ /[\x00-\x7F]*/ ) {
    println "${args[0]} is only ASCII"
    System.exit(0)
} else {
    println "${args[0]} contains non-ASCII characters"
    System.exit(-1)
}

该版本可以用作命令行脚本，并包含退出状态，以便可以链接。

Answer

一个简单的 Groovy 示例：

def str = [ "this doesn't have any unicode", "this one does ±ÁÎ˜Â·€ÔÅ" ]

str.each {
    if( it ==~ /[\x00-\x7F]*/ ) {
        println "all ascii: $it"
    } else {
        println "NOT ASCII: $it"
    }
}

就这么简单：it ==~ /[\x00-\x7F]*/

编辑：我忘了包含文件版本。哎呀：

def text = new File(args[0]).text
if( text ==~ /[\x00-\x7F]*/ ) {
    println "${args[0]} is only ASCII"
    System.exit(0)
} else {
    println "${args[0]} contains non-ASCII characters"
    System.exit(-1)
}

该版本可以用作命令行脚本，并包含退出状态，以便可以链接。

如何在文本文件中找到非 ASCII 字符？

答案1

答案2

答案3

答案4

相关内容