-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21007][SQL]Add SQL function - RIGHT && LEFT #18228
Conversation
128d3e5
to
e227528
Compare
ok to test |
jenkins add to whitelist |
Test build #77824 has finished for PR 18228 at commit
|
Are these ANSI SQL functions? If it is just some esoteric MySQL function I don't think we should add them. |
Both of mysql and SQL server support these two functions, oracle don't support these functions. |
141a42f
to
2136c1b
Compare
Test build #78090 has finished for PR 18228 at commit
|
@@ -342,6 +342,8 @@ object FunctionRegistry { | |||
expression[StringSplit]("split"), | |||
expression[Substring]("substr"), | |||
expression[Substring]("substring"), | |||
left("left"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be implemented using RuntimeReplaceable
. For example, NullIf
Test build #78422 has finished for PR 18228 at commit
|
Test build #78437 has finished for PR 18228 at commit
|
As we already have |
|
str.dataType match { | ||
case StringType => string.asInstanceOf[UTF8String] | ||
.substringSQL(if (len.asInstanceOf[Int] <= 0) Integer.MAX_VALUE else -len.asInstanceOf[Int], | ||
Integer.MAX_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change it to something like
val pos = xyz
val len = xyz
string.asInstanceOf[UTF8String].substringSQL(pos, len)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks
defineCodeGen(ctx, ev, (string, len) => { | ||
str.dataType match { | ||
case StringType => s"$string.substringSQL(($len) <= 0 ? Integer.MAX_VALUE : -($len)," + | ||
s" Integer.MAX_VALUE)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not split the codes in the middle.
checkEvaluation( | ||
Left(s, Literal(-3)), "", row) | ||
checkEvaluation( | ||
Left(s, Literal(0)), "", row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you post the outputs of MySQL when the length is negative?
@gatorsmile Thanks mysql> select right("sparksql",null); mysql> select left("sparksql",null); |
Test build #79432 has finished for PR 18228 at commit
|
SQL | ||
""") | ||
case class Right(str: Expression, len: Expression) | ||
extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use RuntimeReplaceable
? I think both left and right can be implemented by substring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,i will do,thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left
has been implemented by substring, but Right
implemented by substring may be not very good:
case class Right(str: Expression, len: Expression, child: Expression)
extends RuntimeReplaceable {
def this(str: Expression, len: Expression) = {
this(str, len, Substring(str, If(LessThanOrEqual(len, Literal(0)),
Literal(Integer.MAX_VALUE), UnaryMinus(len)), len))
}
override def flatArguments: Iterator[Any] = Iterator(str, len)
override def sql: String = s"$prettyName(${str.sql}, ${len.sql})"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substring
supports negative position, we can implement Right
as Substring(str, UnaryMinus(len))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example:
select right("sparksql",-2);
for this case,we expected is "",
if we implement Right
as Substring(str, UnaryMinus(len)). this result will be parksql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so we should do: If(LessThanOrEqual(len, Literal(0), Literal(UTF8String.EMPTY_UTF8), Substring(str, UnaryMinus(len)))
. Complex expression is OK, after codegen, it should be almost same as a customize implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,thank you very much.
Test build #79456 has finished for PR 18228 at commit
|
fa81e44
to
1cb0448
Compare
* Returns the rightmost n characters from the string. | ||
*/ | ||
@ExpressionDescription( | ||
usage = "_FUNC_(str, len) - Returns the rightmost `len` characters from the string `str`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also explain the behavior if len
is less or equal than 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also mention that len
can be string type. BTW is this common in other databases to support string type len
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, MYSQL support string type len, too
* Returns the leftmost n characters from the string. | ||
*/ | ||
@ExpressionDescription( | ||
usage = "_FUNC_(str, len) - Returns the leftmost `len` characters from the string `str`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
-- !query 9 schema | ||
struct<left('abcd', -2):string,left('abcd', 0):string,left('abcd', 'a'):string> | ||
-- !query 9 output | ||
NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the corrected answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this has a question with this test case: left("abcd", 'a')
In Mysql:
mysql> select left("abcd", -2), left("abcd", 0), left("abcd", 'a');
+------------------+-----------------+-------------------+
| left("abcd", -2) | left("abcd", 0) | left("abcd", 'a') |
+------------------+-----------------+-------------------+
| | | |
+------------------+-----------------+-------------------+
mysql> select right("abcd", -2), right("abcd", 0), right("abcd", 'a');
+-------------------+------------------+--------------------+
| right("abcd", -2) | right("abcd", 0) | right("abcd", 'a') |
+-------------------+------------------+--------------------+
| | | |
+-------------------+------------------+--------------------+
Substring is same as Left
Test build #79510 has finished for PR 18228 at commit
|
Test build #79511 has finished for PR 18228 at commit
|
Test build #79518 has finished for PR 18228 at commit
|
*/ | ||
// scalastyle:off line.size.limit | ||
@ExpressionDescription( | ||
usage = "_FUNC_(str, len) - Returns the rightmost `len`(`len` can be string type) characters from the string `str`,if `len` is less or equal than 0 the result is ``.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int: ... the result is an empty string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable { | ||
def this(str: Expression, len: Expression) = { | ||
this(str, len, Substring(str, If(LessThanOrEqual(len, Literal(0)), | ||
Literal(Integer.MAX_VALUE), UnaryMinus(len)), len)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer If(LessThanOrEqual(len, Literal(0), Literal(UTF8String.EMPTY_UTF8), Substring(str, UnaryMinus(len)))
.
The reason is that, your expression will end up calling UTF8String.substringSQL(Int.Max, ...)
, which goes through all bytes in this UTF8String
and is a performance waste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right(null, -10)
I agree with you, but , for this test case, there is a problem:
Which we expected is null
,but it is an empty string
// scalastyle:on line.size.limit | ||
case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable { | ||
def this(str: Expression, len: Expression) = { | ||
this(str, len, If(LessThanOrEqual(len, Literal(0)), If(IsNull(str), Literal(null, StringType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do the null check first, e.g.
If(
IsNull(str),
Literal(null, StringType),
If(
LessThanOrEqual(len, Literal(0)),
Literal(UTF8String.EMPTY_UTF8, StringType),
new Substring(str, UnaryMinus(len))
)
)
Test build #79548 has finished for PR 18228 at commit
|
Test build #79550 has finished for PR 18228 at commit
|
retest this please |
Test build #79551 has finished for PR 18228 at commit
|
LGTM, merging to master! |
What changes were proposed in this pull request?
Add SQL function - RIGHT && LEFT, same as MySQL:
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_left
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_right
How was this patch tested?
unit test