Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatible with mysql support change default collation for utf8mb4 #45455

Open
King-Dylan opened this issue Jul 19, 2023 · 3 comments
Open

Compatible with mysql support change default collation for utf8mb4 #45455

King-Dylan opened this issue Jul 19, 2023 · 3 comments
Labels
compatibility-mysql80 This is a compatibility issue with MySQL 8.0(but NOT 5.7) sig/sql-infra SIG: SQL Infra type/compatibility type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@King-Dylan
Copy link
Contributor

King-Dylan commented Jul 19, 2023

Feature Request

Is your feature request related to a problem? Please describe:

Mysql 8.0 support change default collation for utf8mb4.If we create table with DEFAULT CHARSET=utf8mb4,but did not specify collation, so the value of collation is inherited from default_collation_for_utf8mb4.But tidb doesn't support it yet.

Describe the feature you'd like:

TiDB should support default_collation_for_utf8mb4 variable let user can change the default collation for utf8mb4.

mysql> SELECT @@GLOBAL.collation_database,@@SESSION.collation_database;
+-----------------------------+------------------------------+
| @@GLOBAL.collation_database | @@SESSION.collation_database |
+-----------------------------+------------------------------+
| utf8mb4_general_ci          | utf8mb4_general_ci           |
+-----------------------------+------------------------------+
1 row in set (0.00 sec)

mysql> create database sbtest1 /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci */;
Query OK, 1 row affected (0.09 sec)

mysql> use sbtest1;
Database changed
mysql> create table t3 (id int)  DEFAULT CHARSET=utf8mb4 ;
Query OK, 0 rows affected (0.02 sec)

mysql> show create table t3;
+-------+--------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                       |
+-------+--------------------------------------------------------------------------------------------------------------------+
| t3    | CREATE TABLE `t3` (
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+--------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> set session default_collation_for_utf8mb4='utf8mb4_general_ci';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show variables like '%collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8_general_ci    |
| collation_database            | utf8mb4_0900_ai_ci |
| collation_server              | utf8mb4_general_ci |
| default_collation_for_utf8mb4 | utf8mb4_general_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)

mysql> create table t4 (id int)  DEFAULT CHARSET=utf8mb4 ;
Query OK, 0 rows affected (0.03 sec)

mysql> show create table t4;
+-------+--------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                       |
+-------+--------------------------------------------------------------------------------------------------------------------+
| t4    | CREATE TABLE `t4` (
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
+-------+--------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@King-Dylan King-Dylan added the type/feature-request Categorizes issue or PR as related to a new feature. label Jul 19, 2023
@King-Dylan King-Dylan changed the title support change default collation for utf8mb4 Compatible with mysql support change default collation for utf8mb4 Jul 19, 2023
@King-Dylan
Copy link
Contributor Author

Mysql will discard this variable in the future, so this is not a good feature request,I think we should have the table collation inherit from the default collation of the database.

mysql> set session default_collation_for_utf8mb4='utf8mb4_general_ci';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                |
+---------+------+--------------------------------------------------------------------------------------------------------+
| Warning | 1681 | Updating 'default_collation_for_utf8mb4' is deprecated. It will be made read-only in a future release. |
+---------+------+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@YangKeao YangKeao added the sig/sql-infra SIG: SQL Infra label Jul 20, 2023
@YangKeao
Copy link
Member

I think we should have the table collation inherit from the default collation of the database.

Isn't it just the current behavior?

create database test1 COLLATE utf8mb4_general_ci;
use test1;
create table t2 (id int);
show create table t2;

You'll see the collation of the table is utf8mb4_general_ci

@dveeden
Copy link
Contributor

dveeden commented Aug 2, 2023

Looks like this already works:

sql> CREATE SCHEMA s1 CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.0612 sec)

sql> CREATE SCHEMA s2 CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 0 rows affected (0.0322 sec)

sql> CREATE SCHEMA s3 COLLATE utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.0338 sec)

sql> CREATE SCHEMA s4 CHARACTER SET ascii;
Query OK, 0 rows affected (0.0307 sec)

sql> CREATE TABLE s1.t1 (id INT PRIMARY KEY);
Query OK, 0 rows affected (0.0209 sec)

sql> CREATE TABLE s2.t1 (id INT PRIMARY KEY);
Query OK, 0 rows affected (0.0345 sec)

sql> CREATE TABLE s3.t1 (id INT PRIMARY KEY);
Query OK, 0 rows affected (0.0374 sec)

sql> CREATE TABLE s4.t1 (id INT PRIMARY KEY);
Query OK, 0 rows affected (0.0202 sec)

sql> SHOW CREATE TABLE s1.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.0009 sec)

sql> SHOW CREATE TABLE s2.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.0009 sec)

sql> SHOW CREATE TABLE s3.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.0010 sec)

sql> SHOW CREATE TABLE s4.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin
1 row in set (0.0009 sec)

sql> SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v7.3.0-alpha-309-g8b8ba8b648
Edition: Community
Git Commit Hash: 8b8ba8b648f24fabbd4c0e2fcf9d6e5bdde97a4b
Git Branch: navicat
UTC Build Time: 2023-08-02 06:44:38
GoVersion: go1.20.6
Race Enabled: false
Check Table Before Drop: false
Store: unistore
1 row in set (0.0003 sec)

So:

  • New schema's get
    • a default of utf8mb4_bin in TiDB
    • a default of utf8mb4_0900_ci_ai in MySQL 8.0 and 8.1 (can be changed by default_collation_for_utf8mb4, which is deprecated)
  • New tables follow the schema charset and collation
  • New columns follow the table charset and collation

Once #37566 is done we could consider changing the default collation of TiDB to match MySQL or make it configurable by supporting default_collation_for_utf8mb4.

This is related to #29350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility-mysql80 This is a compatibility issue with MySQL 8.0(but NOT 5.7) sig/sql-infra SIG: SQL Infra type/compatibility type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants