确保 MySQL 数据库完全采用 UTF8 的最佳方法

确保 MySQL 数据库完全采用 UTF8 的最佳方法

在 UTF8 和非 UTF8 字符串出现一些问题后,我们开始使用 UTF8 进行标准化。我需要做的一件事是检查 MySQL 数据库中的所有内容是否都是 UTF8?我需要检查什么?

  • 服务器默认字符集
  • 各数据库默认字符集
  • 每个文本列都有字符集吗?如何检查?

我正在考虑将所有这些放入 nagios 插件中,以检查所有内容是否都采用 UTF8。有什么建议吗?

答案1

Rory,首先,你想要监控数据库中创建的内容是正确的。虽然我们都采取措施来防止错误,但你不能假设错误不会悄悄出现。我做了非常类似的事情,因为我们的大多数基础设施都要求使用 UTF8。

以下查询适用于检查统计数据:

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.SCHEMATA GROUP BY DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME;
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_COLLATION, COUNT(0) AS COUNT FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL GROUP BY TABLE_COLLATION;
# to filter schema use TABLE_SCHEMA in the where clause
SELECT CHARACTER_SET_NAME, COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME IS NOT NULL AND COLLATION_NAME IS NOT NULL GROUP BY CHARACTER_SET_NAME, COLLATION_NAME;
# to filter schema use TABLE_SCHEMA in the where clause

如果发现任何结果,以下查询将通过电子邮件通知您:

SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE DEFAULT_CHARACTER_SET_NAME NOT LIKE '%utf8%' OR DEFAULT_COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL AND TABLE_COLLATION NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME NOT LIKE '%utf8%' OR COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause

这些查询适用于 MySQL >= 5.0。如果您需要帮助编写 cron 作业,请告诉我。

我的开发人员(主要是 Ruby on Rails 团队)也遇到了一个问题,他们创建了不需要的 InnoDB 表。为了保密,我使用 cron 监控暂存数据库,该 cron 调用:

SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE FROM information_schema.TABLES WHERE ENGINE = 'InnoDB';

答案2

我会做类似 mysqldump --all-databases --no-data | grep -i character 的事情

答案3

Jure1873 建议使用 grep,而我评论说 awk 更合适。以下是我整理出来的。

mysqldump --all-databases --no-data | awk '{gsub("[`;]","");};/^USE /{db=$2;};/CREATE TABLE /{table=$3;};/^\) ENGINE/{sub(".*CHARSET=","");if(!match($1,"utf8")){print db ":" table ":" $1;}}'

这里的可读性增强了:

mysqldump --all-databases --no-data | \
awk '
    {
        gsub("[`;]","");
    };
    /^USE /{
        db=$2;
    }; \
    /CREATE TABLE /{
        table=$3;
    }; \
    /^\) ENGINE/{
        sub(".*CHARSET=","");
        if(!match($1,"utf8")){
            print db ":" table ":" $1; 
        }   
    }   
'

答案4

dbs=$(echo 'show databases;' | mysql | grep -v '^Database$' | tr "\n" " ")

for d in ${dbs}; do
 tables=$(echo 'show tables' | mysql $d | grep -v '^Tables' | tr "\n" " ")
 for t in ${tables}; do
  echo "**************** DB ${d} TABLE ${t}"
  mysqldump --no-data ${d} ${t} | grep  COLLATE | grep -v utf && echo "DB ${d} TABLE ${t} !!!!"
  mysqldump --no-data ${d} ${t} | grep  CHARSET | grep -v utf && echo "DB ${d} TABLE ${t} !!!!"
 done
done

相关内容