-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two table's comparison in char(20) between binary(20) and binary(20)
is different from MySQL
#34823
Comments
char(20) between binary(20) and binary(20)
is different from MySQL
I'm elevating the severity to major as it seems a typical wrong result. |
We got bug in type deducing for As MySQL's behavior, we should not add
|
For the single table query above, even though the output is desirable, according to the execution plan +---------------------------+---------+-----------+---------------+------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+---------------------------+---------+-----------+---------------+------------------------------------------------------------------------------------+
| Projection_4 | 2.40 | root | | test.t.a |
| └─TableReader_7 | 2.40 | root | | data:Selection_6 |
| └─Selection_6 | 2.40 | cop[tikv] | | ge(cast(test.t.a, binary(20)), test.t.b), le(cast(test.t.a, binary(20)), test.t.c) |
| └─TableFullScan_5 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+---------------------------+---------+-----------+---------------+------------------------------------------------------------------------------------+
4 rows in set (0.00 sec) it can be found that tikv coprocessor is the one who actually executes the +---------------------------+---------+-----------+---------------+------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+---------------------------+---------+-----------+---------------+------------------------------------------------------------------------------------+
| Projection_4 | 2.40 | root | | test.t.a |
| └─Selection_7 | 2.40 | root | | ge(cast(test.t.a, binary(20)), test.t.b), le(cast(test.t.a, binary(20)), test.t.c) |
| └─TableReader_6 | 3.00 | root | | data:TableFullScan_5 |
| └─TableFullScan_5 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+---------------------------+---------+-----------+---------------+------------------------------------------------------------------------------------+
4 rows in set, 2 warnings (0.00 sec) And now the result is undesirable again. mysql> select a from t where a between b and c;
+------+
| a |
+------+
| -1 |
| -1 |
+------+
2 rows in set (0.01 sec) Therefore, operator execution in tikv side and tidb side is inconsistent. |
On the other hand, by opening a client using In mysqldb mysql> select a from t where (a collate utf8mb4_bin) between b and c;
Field 1: `a`
Catalog: `def`
Database: `test`
Table: `t`
Org_table: `t`
Type: STRING
Collation: utf8mb4_general_ci (45)
Length: 80
Max_length: 2
Decimals: 0
Flags: BINARY
+------+
| a |
+------+
| -1 |
| -1 |
+------+
2 rows in set (0.00 sec) In tidb (push down forbidden) mysql> select a from t where (a collate utf8mb4_general_ci) between b and c;
Field 1: `a`
Catalog: `def`
Database: `test`
Table: `t`
Org_table: `t`
Type: STRING
Collation: utf8mb4_bin (46)
Length: 80
Max_length: 0
Decimals: 0
Flags:
0 rows in set (0.04 sec) Besides, if user sets the default collation to be |
Thanks @Willendless for the deep investigation. Could you summarize your findings in like 2 or 3 sentences? |
Results are different when
Why :
In a word, we deduce wrong collation info for |
Sure, basically I have three findings.
// tidb mysql> SELECT @@collation_server, @@collation_database;
+--------------------+----------------------+
| @@collation_server | @@collation_database |
+--------------------+----------------------+
| utf8mb4_bin | utf8mb4_bin |
+--------------------+----------------------+
1 row in set (0.00 sec) // mysql mysql> SELECT @@collation_server, @@collation_database;
+--------------------+----------------------+
| @@collation_server | @@collation_database |
+--------------------+----------------------+
| utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci |
+--------------------+----------------------+
1 row in set (0.01 sec)
|
I have some concerns on your bullet 2. I think the default collation is a configurable item in tidb server. So would you investigate more on how to config the sever's default collation and do your experiments again? |
summaries
experimentsI basically did the following experiments in mysql to prove my above arguments. And I use For point 1, mysql> select concat(a, b) from t;
+------------------------------------------------+
| concat(a, b) |
+------------------------------------------------+
| 0x2D312D31000000000000000000000000000000000000 |
| 0x2D312D31000000000000000000000000000000000000 |
+------------------------------------------------+
2 rows in set (0.01 sec)
mysql> select concat(cast(a as binary(20)), b) from t;
+------------------------------------------------------------------------------------+
| concat(cast(a as binary(20)), b) |
+------------------------------------------------------------------------------------+
| 0x2D310000000000000000000000000000000000002D31000000000000000000000000000000000000 |
| 0x2D310000000000000000000000000000000000002D31000000000000000000000000000000000000 |
+------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select concat(cast(a as binary), b) from t;
+------------------------------------------------------------+
| concat(cast(a as binary), b) |
+------------------------------------------------------------+
| 0x2D312D31000000000000000000000000000000000000 |
| 0x2D312D31000000000000000000000000000000000000 |
+------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select * from t where a = b;
0 rows in set (0.00 sec)
mysql> select * from t where cast(a as binary(20)) = b;
+------+--------------------------------------------+--------------------------------------------+
| a | b | c |
+------+--------------------------------------------+--------------------------------------------+
| -1 | 0x2D31000000000000000000000000000000000000 | 0x6700000000000000000000000000000000000000 |
| -1 | 0x2D31000000000000000000000000000000000000 | 0x7300000000000000000000000000000000000000 |
+------+--------------------------------------------+--------------------------------------------+
2 rows in set (0.00 sec)
mysql> select * from t where cast(a as binary) = b;
0 rows in set (0.00 sec) Note that I use '=' operator in the last three queries. I will explain this in the final section. For point 2, mysql> select concat((a collate utf8mb4_bin), b), length(concat((a collate utf8mb4_bin), b)) from t;
+------------------------------------+--------------------------------------------+
| concat((a collate utf8mb4_bin), b) | length(concat((a collate utf8mb4_bin), b)) |
+------------------------------------+--------------------------------------------+
| -1-1 | 22 |
| -1-1 | 22 |
+------------------------------------+--------------------------------------------+
2 rows in set (0.00 sec)
mysql> select concat((a collate utf8mb4_general_ci), b), length(concat((a collate utf8mb4_general_ci), b)) from t;
+-------------------------------------------+---------------------------------------------------+
| concat((a collate utf8mb4_general_ci), b) | length(concat((a collate utf8mb4_general_ci), b)) |
+-------------------------------------------+---------------------------------------------------+
| -1-1 | 22 |
| -1-1 | 22 |
+-------------------------------------------+---------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select concat(a, cast(b as char)), length(concat(a, cast(b as char))) from t;
+----------------------------+------------------------------------+
| concat(a, cast(b as char)) | length(concat(a, cast(b as char))) |
+----------------------------+------------------------------------+
| -1-1 | 22 |
| -1-1 | 22 |
+----------------------------+------------------------------------+
2 rows in set (0.01 sec)
mysql> select concat(a, cast(b as char(20))), length(concat(a, cast(b as char(20)))) from t;
+--------------------------------+----------------------------------------+
| concat(a, cast(b as char(20))) | length(concat(a, cast(b as char(20)))) |
+--------------------------------+----------------------------------------+
| -1-1 | 22 |
| -1-1 | 22 |
+--------------------------------+----------------------------------------+
2 rows in set (0.00 sec)
mysql> select * from t where (a collate utf8mb4_bin) >= b;
+------+--------------------------------------------+--------------------------------------------+
| a | b | c |
+------+--------------------------------------------+--------------------------------------------+
| -1 | 0x2D31000000000000000000000000000000000000 | 0x6700000000000000000000000000000000000000 |
| -1 | 0x2D31000000000000000000000000000000000000 | 0x7300000000000000000000000000000000000000 |
+------+--------------------------------------------+--------------------------------------------+
2 rows in set (0.00 sec)
mysql> select * from t where a >= cast(b as char);
+------+--------------------------------------------+--------------------------------------------+
| a | b | c |
+------+--------------------------------------------+--------------------------------------------+
| -1 | 0x2D31000000000000000000000000000000000000 | 0x6700000000000000000000000000000000000000 |
| -1 | 0x2D31000000000000000000000000000000000000 | 0x7300000000000000000000000000000000000000 |
+------+--------------------------------------------+--------------------------------------------+
2 rows in set (0.01 sec)
mysql> select * from t where a >= cast(b as char(20));
+------+--------------------------------------------+--------------------------------------------+
| a | b | c |
+------+--------------------------------------------+--------------------------------------------+
| -1 | 0x2D31000000000000000000000000000000000000 | 0x6700000000000000000000000000000000000000 |
| -1 | 0x2D31000000000000000000000000000000000000 | 0x7300000000000000000000000000000000000000 |
+------+--------------------------------------------+--------------------------------------------+
2 rows in set (0.00 sec) explanationSeveral questions can be answered based on my above theory
Because by casting string in column a to
According to mysql documentation, 10.8.4 Collation Coercibility in Expressions, by using mysql> select * from t where concat(substring(a, 1, 2), '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0') = cast(b as char(20));
+------+--------------------------------------------+--------------------------------------------+
| a | b | c |
+------+--------------------------------------------+--------------------------------------------+
| -1 | 0x2D31000000000000000000000000000000000000 | 0x6700000000000000000000000000000000000000 |
| -1 | 0x2D31000000000000000000000000000000000000 | 0x7300000000000000000000000000000000000000 |
+------+--------------------------------------------+--------------------------------------------+
2 rows in set (0.01 sec) |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
3. What did you see instead (Required)
Tips : If it is a single table query, the result is the same as mysql.
4. What is your TiDB version? (Required)
Release Version: v6.1.0-nightly
Edition: Community
Git Commit Hash: 828a255
Git Branch: heads/refs/tags/v6.1.0-nightly
UTC Build Time: 2022-05-17 23:02:16
GoVersion: go1.18.2
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
The text was updated successfully, but these errors were encountered: