Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21007][SQL]Add SQL function - RIGHT && LEFT #18228

Closed
wants to merge 1 commit into from

Conversation

10110346
Copy link
Contributor

@10110346 10110346 commented Jun 7, 2017

What changes were proposed in this pull request?

Add SQL function - RIGHT && LEFT, same as MySQL:
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_left
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_right

How was this patch tested?

unit test

@10110346 10110346 force-pushed the lx-wip-0607 branch 7 times, most recently from 128d3e5 to e227528 Compare June 8, 2017 04:30
@sameeragarwal
Copy link
Member

ok to test

@sameeragarwal
Copy link
Member

jenkins add to whitelist

@SparkQA
Copy link

SparkQA commented Jun 9, 2017

Test build #77824 has finished for PR 18228 at commit e227528.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression)

@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

Are these ANSI SQL functions? If it is just some esoteric MySQL function I don't think we should add them.

@10110346
Copy link
Contributor Author

Both of mysql and SQL server support these two functions, oracle don't support these functions.

@10110346 10110346 force-pushed the lx-wip-0607 branch 2 times, most recently from 141a42f to 2136c1b Compare June 15, 2017 08:04
@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78090 has finished for PR 18228 at commit 2136c1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression)

@@ -342,6 +342,8 @@ object FunctionRegistry {
expression[StringSplit]("split"),
expression[Substring]("substr"),
expression[Substring]("substring"),
left("left"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be implemented using RuntimeReplaceable. For example, NullIf

@SparkQA
Copy link

SparkQA commented Jun 22, 2017

Test build #78422 has finished for PR 18228 at commit fca984d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression)
  • case class Left(str: Expression, len: Expression)

@SparkQA
Copy link

SparkQA commented Jun 22, 2017

Test build #78437 has finished for PR 18228 at commit 9b85478.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression)
  • case class Left(str: Expression, len: Expression)

@viirya
Copy link
Member

viirya commented Jul 4, 2017

As we already have substring, do we still need them?

@10110346
Copy link
Contributor Author

10110346 commented Jul 4, 2017

Substring is the most commonly used as left and right, and i think using these form is more friendly for users.
Also mysql and SQL server support these two functions and Substring .Thanks @viirya

str.dataType match {
case StringType => string.asInstanceOf[UTF8String]
.substringSQL(if (len.asInstanceOf[Int] <= 0) Integer.MAX_VALUE else -len.asInstanceOf[Int],
Integer.MAX_VALUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change it to something like

val pos = xyz
val len = xyz
string.asInstanceOf[UTF8String].substringSQL(pos, len)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks

defineCodeGen(ctx, ev, (string, len) => {
str.dataType match {
case StringType => s"$string.substringSQL(($len) <= 0 ? Integer.MAX_VALUE : -($len)," +
s" Integer.MAX_VALUE)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not split the codes in the middle.

checkEvaluation(
Left(s, Literal(-3)), "", row)
checkEvaluation(
Left(s, Literal(0)), "", row)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you post the outputs of MySQL when the length is negative?

@10110346
Copy link
Contributor Author

@gatorsmile Thanks
I have tested in mysql:

mysql> select right("sparksql",null);
+------------------------+
| right("sparksql",null) |
+------------------------+
| NULL |
+------------------------+
mysql> select right("sparksql",0);
+---------------------+
| right("sparksql",0) |
+---------------------+
| |
+---------------------+
mysql> select right("sparksql",-2);
+----------------------+
| right("sparksql",-2) |
+----------------------+
| |
+----------------------+
mysql> select right("sparksql",2);
+---------------------+
| right("sparksql",2) |
+---------------------+
| ql |
+---------------------+

mysql> select left("sparksql",null);
+-----------------------+
| left("sparksql",null) |
+-----------------------+
| NULL |
+-----------------------+
mysql> select left("sparksql",0);
+--------------------+
| left("sparksql",0) |
+--------------------+
| |
+--------------------+
mysql> select left("sparksql",-2);
+---------------------+
| left("sparksql",-2) |
+---------------------+
| |
+---------------------+
mysql> select left("sparksql",2);
+--------------------+
| left("sparksql",2) |
+--------------------+
| sp |
+--------------------+

@SparkQA
Copy link

SparkQA commented Jul 10, 2017

Test build #79432 has finished for PR 18228 at commit 2eb4d0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression)
  • case class Left(str: Expression, len: Expression)

SQL
""")
case class Right(str: Expression, len: Expression)
extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use RuntimeReplaceable? I think both left and right can be implemented by substring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,i will do,thanks

Copy link
Contributor Author

@10110346 10110346 Jul 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left has been implemented by substring, but Right implemented by substring may be not very good:

case class Right(str: Expression, len: Expression, child: Expression)
  extends RuntimeReplaceable {
  def this(str: Expression, len: Expression) = {
    this(str, len, Substring(str, If(LessThanOrEqual(len, Literal(0)),
      Literal(Integer.MAX_VALUE), UnaryMinus(len)), len))
  }

  override def flatArguments: Iterator[Any] = Iterator(str, len)

  override def sql: String = s"$prettyName(${str.sql}, ${len.sql})"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substring supports negative position, we can implement Right as Substring(str, UnaryMinus(len)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:
select right("sparksql",-2);
for this case,we expected is "",
if we implement Right as Substring(str, UnaryMinus(len)). this result will be parksql

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so we should do: If(LessThanOrEqual(len, Literal(0), Literal(UTF8String.EMPTY_UTF8), Substring(str, UnaryMinus(len))). Complex expression is OK, after codegen, it should be almost same as a customize implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,thank you very much.

@SparkQA
Copy link

SparkQA commented Jul 10, 2017

Test build #79456 has finished for PR 18228 at commit 4a7941e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression)
  • case class Left(str: Expression, len: Expression, child: Expression)

@10110346 10110346 force-pushed the lx-wip-0607 branch 2 times, most recently from fa81e44 to 1cb0448 Compare July 11, 2017 07:21
* Returns the rightmost n characters from the string.
*/
@ExpressionDescription(
usage = "_FUNC_(str, len) - Returns the rightmost `len` characters from the string `str`.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also explain the behavior if len is less or equal than 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also mention that len can be string type. BTW is this common in other databases to support string type len?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, MYSQL support string type len, too

* Returns the leftmost n characters from the string.
*/
@ExpressionDescription(
usage = "_FUNC_(str, len) - Returns the leftmost `len` characters from the string `str`.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

-- !query 9 schema
struct<left('abcd', -2):string,left('abcd', 0):string,left('abcd', 'a'):string>
-- !query 9 output
NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the corrected answer?

Copy link
Contributor Author

@10110346 10110346 Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this has a question with this test case: left("abcd", 'a')
In Mysql:
mysql> select left("abcd", -2), left("abcd", 0), left("abcd", 'a');
+------------------+-----------------+-------------------+
| left("abcd", -2) | left("abcd", 0) | left("abcd", 'a') |
+------------------+-----------------+-------------------+
| | | |
+------------------+-----------------+-------------------+
mysql> select right("abcd", -2), right("abcd", 0), right("abcd", 'a');
+-------------------+------------------+--------------------+
| right("abcd", -2) | right("abcd", 0) | right("abcd", 'a') |
+-------------------+------------------+--------------------+
| | | |
+-------------------+------------------+--------------------+
Substring is same as Left

@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79510 has finished for PR 18228 at commit fa81e44.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression, child: Expression)
  • case class Left(str: Expression, len: Expression, child: Expression)

@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79511 has finished for PR 18228 at commit 1cb0448.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable
  • case class Left(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable

@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79518 has finished for PR 18228 at commit 529417f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable
  • case class Left(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable

*/
// scalastyle:off line.size.limit
@ExpressionDescription(
usage = "_FUNC_(str, len) - Returns the rightmost `len`(`len` can be string type) characters from the string `str`,if `len` is less or equal than 0 the result is ``.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int: ... the result is an empty string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable {
def this(str: Expression, len: Expression) = {
this(str, len, Substring(str, If(LessThanOrEqual(len, Literal(0)),
Literal(Integer.MAX_VALUE), UnaryMinus(len)), len))
Copy link
Contributor

@cloud-fan cloud-fan Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer If(LessThanOrEqual(len, Literal(0), Literal(UTF8String.EMPTY_UTF8), Substring(str, UnaryMinus(len))).

The reason is that, your expression will end up calling UTF8String.substringSQL(Int.Max, ...), which goes through all bytes in this UTF8String and is a performance waste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right(null, -10)
I agree with you, but , for this test case, there is a problem:
Which we expected is null,but it is an empty string

// scalastyle:on line.size.limit
case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable {
def this(str: Expression, len: Expression) = {
this(str, len, If(LessThanOrEqual(len, Literal(0)), If(IsNull(str), Literal(null, StringType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do the null check first, e.g.

If(
  IsNull(str),
  Literal(null, StringType),
  If(
    LessThanOrEqual(len, Literal(0)),
    Literal(UTF8String.EMPTY_UTF8, StringType),
    new Substring(str, UnaryMinus(len))
  )
)

@SparkQA
Copy link

SparkQA commented Jul 12, 2017

Test build #79548 has finished for PR 18228 at commit b75b6ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable
  • case class Left(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable

@SparkQA
Copy link

SparkQA commented Jul 12, 2017

Test build #79550 has finished for PR 18228 at commit 991bf99.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable
  • case class Left(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable

@10110346
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 12, 2017

Test build #79551 has finished for PR 18228 at commit 991bf99.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Right(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable
  • case class Left(str: Expression, len: Expression, child: Expression) extends RuntimeReplaceable

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in aaad34d Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants