-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Built-in string functions implementation: ASCII, BIN, BIT_LENGTH #9793
Conversation
a3d7776
to
9f5e850
Compare
Signed-off-by: wanghaoyang1995 <wanghy1995@qq.com>
9f5e850
to
1679a72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty good effort @wanghaoyang1995. I've left some review feedback. Thanks for your first contribution to Vitess!
go/vt/vtgate/evalengine/string.go
Outdated
tobin.makeUnsignedIntegral() | ||
str = strconv.FormatUint(tobin.uint64(), 2) | ||
case sqltypes.IsText(t): | ||
toint64, _ := strconv.ParseInt(tobin.string(), 10, 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversion is not quite right: I think makeUnsignedLiteral
would work better here. You can test this in the integration suite by adding some malformed string/numbers, such as "123foo"
.
go/vt/vtgate/evalengine/string.go
Outdated
switch t := tobin.typeof(); { | ||
case sqltypes.IsNumber(t): | ||
tobin.makeUnsignedIntegral() | ||
str = strconv.FormatUint(tobin.uint64(), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of FormatUint
, you can use AppendUint
to create directly a []byte
slice instead of a string. You'll have to call result.setRaw
but that'll save one allocation.
}) | ||
} | ||
|
||
func TestBuiltinBitLength(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job with these integration tests. I think the thing you're missing are strings with Unicode characters: it's important to test the behavior of these functions with Unicode strings. What happens if you call ASCII()
on an emoji? What's the BIT_COUNT
of a string that contains emoji or uses a multi-byte encoding?
Signed-off-by: wanghaoyang1995 <wanghy1995@qq.com>
@vmg Hello,vmg. I have fixed some problems as suggested. For some case not defined in MySQL docs, there are different outputs between MySQL 5 and MySQL 8. Which one should I refer to? |
"中文测试", | ||
"日本語テスト", | ||
"한국어 시험", | ||
"😊😂🤢", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are not testing the right thing -- you forgot to enclose these Unicode strings with quotes, so what the integration test is testing is that MySQL and Vitess both give a parsing error when parsing the expression, which they both do. :)
go/vt/vtgate/evalengine/string.go
Outdated
} | ||
|
||
toascii.makeBinary() | ||
if len(toascii.bytes()) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not call bytes()
twice, we can store the result.
go/vt/vtgate/evalengine/string.go
Outdated
input := tobin.string() | ||
index := len(input) | ||
if index == 0 { | ||
break | ||
} | ||
for i, ch := range input { | ||
if !unicode.IsDigit(ch) { | ||
index = i | ||
break | ||
} | ||
} | ||
intstr := input[:index] | ||
toint64, _ := strconv.ParseInt(intstr, 10, 64) | ||
raw = strconv.AppendUint(raw, uint64(toint64), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. We already have a function that tries to parse numbers into float
and handles this logic properly. I would try using that one, and then converting the float64
to uint64
. I have a hunch that's what MySQL is doing behind the scenes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
In my consideration, the above code is mainly to deal with strings containing both letters and numbers, not only for float. Such as "123abc".
I tried to write a piece of code, firstly convert to float and then convert to uint64. Like this:
func (builtinBin) call(env *ExpressionEnv, args []EvalResult, result *EvalResult) {
tobin := &args[0]
if tobin.isNull() {
result.setNull()
return
}
var raw []byte
switch t := tobin.typeof(); {
case sqltypes.IsNumber(t):
tobin.makeUnsignedIntegral()
raw = strconv.AppendUint(raw, tobin.uint64(), 2)
case sqltypes.IsText(t):
if len(tobin.bytes()) == 0 {
return
}
tobin.makeFloat()
float, _ := tobin.coerceToFloat()
raw = strconv.AppendUint(raw, uint64(float), 2)
default:
throwEvalError(vterrors.Errorf(vtrpcpb.Code_UNIMPLEMENTED, "Unsupported BIN argument: %s", t.String()))
}
result.setRaw(sqltypes.VarChar, raw, collations.TypedCollation{
Collation: env.DefaultCollation,
Coercibility: collations.CoerceCoercible,
Repertoire: collations.RepertoireASCII,
})
}
It was well behaved when inputted ”123abc“ or "123.456".
But for strings like "2e2", it has a difference with MySQL's output.
mysql> select bin("2e2");
+------------+
| bin(:vtg1) |
+------------+
| 11001000 |
+------------+
1 row in set (1.42 sec)
While the MySQL5.7 returned 10
.
mysql> select bin("2e2");
+------------+
| bin("2e2") |
+------------+
| 10 |
+------------+
1 row in set (0.00 sec)
Is it possible to bypass string processing?
I'm not very familiar with go code and would like some advice from you, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try this:
func (builtinBin) call(env *ExpressionEnv, args []EvalResult, result *EvalResult) {
tobin := &args[0]
if tobin.isNull() {
result.setNull()
return
}
tobin.makeUnsignedIntegral()
raw := strconv.AppendUint(raw, tobin.uint64(), 2)
result.setRaw(sqltypes.VarChar, raw, collations.TypedCollation{
Collation: env.DefaultCollation,
Coercibility: collations.CoerceCoercible,
Repertoire: collations.RepertoireASCII,
})
}
I don't think you have to switch based on the type. makeUnsignedIntegral
should handle all types gracefully.
go/vt/vtgate/evalengine/string.go
Outdated
case sqltypes.IsQuoted(t): | ||
result.setInt64(int64(8 * len(arg1.bytes()))) | ||
default: | ||
result.setInt64(int64(8 * len(arg1.toRawBytes()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since toRawBytes()
already handles the case where IsQuoted(t)
, you can call it directly. You don't this switch at all.
That's great! We always target MySQL 8's behavior. It would be even better if you added a test for this specific mismatch with a comment that says it's not backwards compatible with MySQL 5. |
Signed-off-by: wanghaoyang1995 <wanghy1995@qq.com>
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
This PR was closed because it has been stale for 7 days with no activity. |
Description
Add MySQL built-in functions
ascii
,bin
,bit_length
to evalengine.Related Issue(s)
#9647
Checklist
Deployment Notes
N/A