MySQL 判断中文字符

2015-01-16 Robin Wen 更多博文 » 博客 » GitHub »

数据库 Database MySQL

原文链接 http://dbarobin.com/2015/01/16/distinguish-chinese-characters-in-mysql/
注:以下为加速网络访问所做的原文缓存,经过重新格式化,可能存在格式方面的问题,或偶有遗漏信息,请以原文为准。


目录

  • Table of Contents {:toc}

文/Robin


本站推广

币安是全球领先的数字货币交易平台,提供比特币、以太坊、BNB 以及 USDT 交易。

币安注册: https://accounts.binancezh.pro/cn/register/?ref=11190872 邀请码: 11190872


一 引子

在生产环境中,经常会有这样的场景:获得中文数据。那问题就来了,怎么才能匹配出中文字符呢?

本文提供两种方法。

二 演示

2.1 环境

mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| protocol_version        | 10                           |
| version                 | 5.1.73                       |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | i386                         |
| version_compile_os      | apple-darwin10.3.0           |
+-------------------------+------------------------------+
5 rows in set (0.00 sec)

2.2 创建测试表和插入测试数据

mysql -S /tmp/mysql_5173.sock -uroot -proot

创建测试表和插入测试数据。

mysql> USE test;
Database changed

mysql> CREATE TABLE user
    -> (name VARCHAR(20)
    ->  ) DEFAULT CHARSET = utf8 ENGINE = INNODB;
Query OK, 0 rows affected (0.10 sec)

mysql> SHOW TABLE STATUS LIKE 'user' \G;
*************************** 1. row ***************************
           Name: user
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 2
 Avg_row_length: 8192
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2015-01-16 18:01:36
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

ERROR: 
No query specified

mysql> INSERT INTO user VALUES('robin');
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO user VALUES('robin');
Query OK, 1 row affected (0.00 sec)

三 实现

3.1 方法一 正则表达式

mysql> SELECT * FROM user \G;
*************************** 1. row ***************************
name: robin
*************************** 2. row ***************************
name: robin
2 rows in set (0.00 sec)

mysql> SELECT name,
    ->     CASE name REGEXP "[\u0391-\uFFE5]"
    ->         WHEN 1 THEN "不是中文字符"
    ->         ELSE "是中文字符"
    ->     END AS "判断是否是中文字符"
    -> FROM user;
+-----------+-----------------------------+
| name      | 判断是否是中文字符 |
+-----------+-----------------------------+
| robin     | 不是中文字符          |
| robin | 是中文字符             |
+-----------+-----------------------------+
2 rows in set (0.00 sec)

mysql> SELECT name FROM user WHERE NOT (name REGEXP "[\u0391-\uFFE5]");
+-----------+
| name      |
+-----------+
| robin |
+-----------+
1 row in set (0.00 sec)

3.2 方法二 length() 和 char_length()

mysql> SELECT name, length(name), char_length(name) FROM user;
+-----------+--------------+-------------------+
| name      | length(name) | char_length(name) |
+-----------+--------------+-------------------+
| robin     |            5 |                 5 |
| robin |           20 |                 9 |
+-----------+--------------+-------------------+
2 rows in set (0.00 sec)

mysql> SELECT name FROM user WHERE length(name) <> char_length(name);
+-----------+
| name      |
+-----------+
| robin |
+-----------+
1 row in set (0.00 sec)

四 总结

方法一中,[\u0391-\uFFE5] 匹配中文以外的字符。

方法二中,当字符集为UTF-8,并且字符为中文时,length()char_length() 两个方法返回的结果不相同。


本站推广

币安是全球领先的数字货币交易平台,提供比特币、以太坊、BNB 以及 USDT 交易。

币安注册: https://accounts.binancezh.pro/cn/register/?ref=11190872 邀请码: 11190872


参考官方文档:

LENGTH() Return the length of a string in bytes Returns the length of the string str, measured in bytes. A multibyte character counts as multiple bytes. This means that for a string containing five 2-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5.

CHAR_LENGTH() Return number of characters in argument Returns the length of the string str, measured in characters. A multibyte character counts as a single character. This means that for a string containing five 2-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5.

五 Ref

  • 12.5 String Functions

–EOF–

版权声明:自由转载-非商用-非衍生-保持署名(创意共享4.0许可证)