Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default collation changes in case sensitivity between TiDB and MySQL #29350

Open
dveeden opened this issue Nov 2, 2021 · 6 comments
Open

Default collation changes in case sensitivity between TiDB and MySQL #29350

dveeden opened this issue Nov 2, 2021 · 6 comments
Labels
sig/sql-infra SIG: SQL Infra type/compatibility type/enhancement The issue or PR belongs to an enhancement.

Comments

@dveeden
Copy link
Contributor

dveeden commented Nov 2, 2021

Bug Report

1. Minimal reproduce step (Required)

TiDB:

TiDB> SELECT VARIABLE_VALUE FROM mysql.tidb WHERE VARIABLE_NAME='new_collation_enabled';
+----------------+
| VARIABLE_VALUE |
+----------------+
| True           |
+----------------+
1 row in set (0.0014 sec)

TiDB> SHOW COLLATION WHERE Charset='utf8mb4' AND `Default`='Yes';
+-------------+---------+----+---------+----------+---------+
| Collation   | Charset | Id | Default | Compiled | Sortlen |
+-------------+---------+----+---------+----------+---------+
| utf8mb4_bin | utf8mb4 | 46 | Yes     | Yes      |       1 |
+-------------+---------+----+---------+----------+---------+
1 row in set (0.0004 sec)

TiDB> SHOW VARIABLES LIKE 'collation\_%';SHOW VARIABLES LIKE 'character\_set\_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| collation_connection | utf8mb4_bin |
| collation_database   | utf8mb4_bin |
| collation_server     | utf8mb4_bin |
+----------------------+-------------+
3 rows in set (0.0058 sec)
+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database   | utf8mb4 |
| character_set_filesystem | binary  |
| character_set_results    | utf8mb4 |
| character_set_server     | utf8mb4 |
| character_set_system     | utf8    |
+--------------------------+---------+
7 rows in set (0.0029 sec)

TiDB> SELECT 'A' = 'a';
+-----------+
| 'A' = 'a' |
+-----------+
|         0 |
+-----------+
1 row in set (0.0006 sec)

TiDB> SELECT tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v5.2.2
Edition: Community
Git Commit Hash: da1c21fd45a4ea5900ac16d2f4a248143f378d18
Git Branch: heads/refs/tags/v5.2.2
UTC Build Time: 2021-10-20 06:08:33
GoVersion: go1.16.4
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.0005 sec)

TiDB> 

MySQL 8.0

mysql> SHOW COLLATION WHERE Charset='utf8mb4' AND `Default`='Yes';
+--------------------+---------+-----+---------+----------+---------+---------------+
| Collation          | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
+--------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes     | Yes      |       0 | NO PAD        |
+--------------------+---------+-----+---------+----------+---------+---------------+
1 row in set (0.0015 sec)

mysql> SHOW VARIABLES LIKE 'collation\_%';SHOW VARIABLES LIKE 'character\_set\_%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database   | utf8mb4_0900_ai_ci |
| collation_server     | utf8mb4_0900_ai_ci |
+----------------------+--------------------+
3 rows in set (0.0023 sec)
+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database   | utf8mb4 |
| character_set_filesystem | binary  |
| character_set_results    | utf8mb4 |
| character_set_server     | utf8mb4 |
| character_set_system     | utf8mb3 |
+--------------------------+---------+
7 rows in set (0.0016 sec)

mysql> SELECT 'A' = 'a';
+-----------+
| 'A' = 'a' |
+-----------+
|         1 |
+-----------+
1 row in set (0.0004 sec)

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.27    |
+-----------+
1 row in set (0.0003 sec)

MySQL 5.7:

mysql> SHOW COLLATION WHERE Charset='utf8mb4' AND `Default`='Yes';
+--------------------+---------+----+---------+----------+---------+
| Collation          | Charset | Id | Default | Compiled | Sortlen |
+--------------------+---------+----+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes     | Yes      |       1 |
+--------------------+---------+----+---------+----------+---------+
1 row in set (0.0007 sec)

mysql> SHOW VARIABLES LIKE 'collation\_%';SHOW VARIABLES LIKE 'character\_set\_%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database   | latin1_swedish_ci  |
| collation_server     | latin1_swedish_ci  |
+----------------------+--------------------+
3 rows in set (0.0029 sec)
+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database   | latin1  |
| character_set_filesystem | binary  |
| character_set_results    | utf8mb4 |
| character_set_server     | latin1  |
| character_set_system     | utf8    |
+--------------------------+---------+
7 rows in set (0.0013 sec)

mysql> SELECT 'A' = 'a';
+-----------+
| 'A' = 'a' |
+-----------+
|         1 |
+-----------+
1 row in set (0.0004 sec)

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.7.31    |
+-----------+
1 row in set (0.0004 sec)

2. What did you expect to see? (Required)

Similar behavior between TiDB and MySQL.

3. What did you see instead (Required)

The default collation behaves case sensitive (binary) for TiDB.

Workaround

Using SET NAMES and/or charset/collation settings in the connection properties when connecting.

TiDB> SET NAMES utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 0 rows affected (0.0004 sec)

TiDB> SELECT 'A' = 'a';
+-----------+
| 'A' = 'a' |
+-----------+
|         1 |
+-----------+
1 row in set (0.0003 sec)
@dveeden dveeden added the type/bug The issue is confirmed as a bug. label Nov 2, 2021
@dveeden
Copy link
Contributor Author

dveeden commented Nov 2, 2021

/type compatibility

@dveeden
Copy link
Contributor Author

dveeden commented Nov 2, 2021

Different client versions behave differently. This might be because some of them might issue a SET NAMES utf8mb4.

$ for m in ./opt/mysql/*/bin/mysql; do $m --version; $m -u root -h 127.0.0.1 -P 4000 -BNe "SELECT @@session.collation_connection"; done
./opt/mysql/5.1.73/bin/mysql  Ver 14.14 Distrib 5.1.73, for unknown-linux-gnu (x86_64) using readline 5.1
utf8mb4_bin
./opt/mysql/5.7.31/bin/mysql  Ver 14.14 Distrib 5.7.31, for linux-glibc2.12 (x86_64) using  EditLine wrapper
utf8_general_ci
./opt/mysql/8.0.22/bin/mysql  Ver 8.0.22 for Linux on x86_64 (MySQL Community Server - GPL)
utf8mb4_bin
./opt/mysql/8.0.26/bin/mysql  Ver 8.0.26 for Linux on x86_64 (MySQL Community Server - GPL)
utf8mb4_bin
./opt/mysql/8.0.27/bin/mysql  Ver 8.0.27 for Linux on x86_64 (MySQL Community Server - GPL)
utf8mb4_bin

@wjhuang2016
Copy link
Member

@dveeden
Copy link
Contributor Author

dveeden commented Nov 3, 2021

It indeed is documented, but as this is causing problems for applications it would be good if we can change this to be more compatible.

@wjhuang2016
Copy link
Member

But it will break backward compatibility, I think you can talk to our PM. It's not a technical problem, it's a product problem.

@bb7133
Copy link
Member

bb7133 commented Nov 8, 2021

IMHO in order to archive 'default case-insensitive collation', we need to set new collation enabled by default firstly. @wjhuang2016 Are you interested in this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra type/compatibility type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

6 participants