Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in string functions implementation: ASCII, BIN, BIT_LENGTH #9793

Closed
wants to merge 7 commits into from

Conversation

wanghaoyang1995
Copy link

Description

Add MySQL built-in functions ascii, bin, bit_length to evalengine.

Related Issue(s)

#9647

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

N/A

@wanghaoyang1995 wanghaoyang1995 force-pushed the main branch 3 times, most recently from a3d7776 to 9f5e850 Compare February 27, 2022 14:11
@frouioui frouioui requested a review from vmg February 27, 2022 14:57
@frouioui frouioui added Component: Query Serving release notes Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Feb 27, 2022
Signed-off-by: wanghaoyang1995 <wanghy1995@qq.com>
Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty good effort @wanghaoyang1995. I've left some review feedback. Thanks for your first contribution to Vitess!

tobin.makeUnsignedIntegral()
str = strconv.FormatUint(tobin.uint64(), 2)
case sqltypes.IsText(t):
toint64, _ := strconv.ParseInt(tobin.string(), 10, 64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion is not quite right: I think makeUnsignedLiteral would work better here. You can test this in the integration suite by adding some malformed string/numbers, such as "123foo".

switch t := tobin.typeof(); {
case sqltypes.IsNumber(t):
tobin.makeUnsignedIntegral()
str = strconv.FormatUint(tobin.uint64(), 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of FormatUint, you can use AppendUint to create directly a []byte slice instead of a string. You'll have to call result.setRaw but that'll save one allocation.

})
}

func TestBuiltinBitLength(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job with these integration tests. I think the thing you're missing are strings with Unicode characters: it's important to test the behavior of these functions with Unicode strings. What happens if you call ASCII() on an emoji? What's the BIT_COUNT of a string that contains emoji or uses a multi-byte encoding?

@vmg vmg added Component: Evalengine changes to the evaluation engine and removed Component: Query Serving labels Feb 28, 2022
@vmg vmg self-assigned this Feb 28, 2022
@wanghaoyang1995
Copy link
Author

@vmg Hello,vmg. I have fixed some problems as suggested. For some case not defined in MySQL docs, there are different outputs between MySQL 5 and MySQL 8. Which one should I refer to?

Comment on lines 34 to 37
"中文测试",
"日本語テスト",
"한국어 시험",
"😊😂🤢",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are not testing the right thing -- you forgot to enclose these Unicode strings with quotes, so what the integration test is testing is that MySQL and Vitess both give a parsing error when parsing the expression, which they both do. :)

}

toascii.makeBinary()
if len(toascii.bytes()) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not call bytes() twice, we can store the result.

Comment on lines 71 to 84
input := tobin.string()
index := len(input)
if index == 0 {
break
}
for i, ch := range input {
if !unicode.IsDigit(ch) {
index = i
break
}
}
intstr := input[:index]
toint64, _ := strconv.ParseInt(intstr, 10, 64)
raw = strconv.AppendUint(raw, uint64(toint64), 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. We already have a function that tries to parse numbers into float and handles this logic properly. I would try using that one, and then converting the float64 to uint64. I have a hunch that's what MySQL is doing behind the scenes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review!

In my consideration, the above code is mainly to deal with strings containing both letters and numbers, not only for float. Such as "123abc".

I tried to write a piece of code, firstly convert to float and then convert to uint64. Like this:

func (builtinBin) call(env *ExpressionEnv, args []EvalResult, result *EvalResult) {
	tobin := &args[0]
	if tobin.isNull() {
		result.setNull()
		return
	}

	var raw []byte
	switch t := tobin.typeof(); {
	case sqltypes.IsNumber(t):
		tobin.makeUnsignedIntegral()
		raw = strconv.AppendUint(raw, tobin.uint64(), 2)
	case sqltypes.IsText(t):
		if len(tobin.bytes()) == 0 {
			return
		}
		tobin.makeFloat()
		float, _ := tobin.coerceToFloat()
		raw = strconv.AppendUint(raw, uint64(float), 2)
	default:
		throwEvalError(vterrors.Errorf(vtrpcpb.Code_UNIMPLEMENTED, "Unsupported BIN argument: %s", t.String()))
	}

	result.setRaw(sqltypes.VarChar, raw, collations.TypedCollation{
		Collation:    env.DefaultCollation,
		Coercibility: collations.CoerceCoercible,
		Repertoire:   collations.RepertoireASCII,
	})
}

It was well behaved when inputted ”123abc“ or "123.456".
But for strings like "2e2", it has a difference with MySQL's output.

mysql> select bin("2e2");
+------------+
| bin(:vtg1) |
+------------+
| 11001000   |
+------------+
1 row in set (1.42 sec)

While the MySQL5.7 returned 10.

mysql> select bin("2e2");
+------------+
| bin("2e2") |
+------------+
| 10         |
+------------+
1 row in set (0.00 sec)

Is it possible to bypass string processing?
I'm not very familiar with go code and would like some advice from you, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this:

func (builtinBin) call(env *ExpressionEnv, args []EvalResult, result *EvalResult) {
	tobin := &args[0]
	if tobin.isNull() {
		result.setNull()
		return
	}

	tobin.makeUnsignedIntegral()
	raw := strconv.AppendUint(raw, tobin.uint64(), 2)
	result.setRaw(sqltypes.VarChar, raw, collations.TypedCollation{
		Collation:    env.DefaultCollation,
		Coercibility: collations.CoerceCoercible,
		Repertoire:   collations.RepertoireASCII,
	})
}

I don't think you have to switch based on the type. makeUnsignedIntegral should handle all types gracefully.

case sqltypes.IsQuoted(t):
result.setInt64(int64(8 * len(arg1.bytes())))
default:
result.setInt64(int64(8 * len(arg1.toRawBytes())))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since toRawBytes() already handles the case where IsQuoted(t), you can call it directly. You don't this switch at all.

@vmg
Copy link
Collaborator

vmg commented Mar 16, 2022

For some case not defined in MySQL docs, there are different outputs between MySQL 5 and MySQL 8. Which one should I refer to?

That's great! We always target MySQL 8's behavior. It would be even better if you added a test for this specific mismatch with a comment that says it's not backwards compatible with MySQL 5.

Signed-off-by: wanghaoyang1995 <wanghy1995@qq.com>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2022

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Aug 8, 2022
@github-actions
Copy link
Contributor

This PR was closed because it has been stale for 7 days with no activity.

@github-actions github-actions bot closed this Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Evalengine changes to the evaluation engine Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants