Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30341][SQL] Overflow check for interval arithmetic operations #26995

Closed
wants to merge 20 commits into from
Closed

[SPARK-30341][SQL] Overflow check for interval arithmetic operations #26995

wants to merge 20 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Dec 24, 2019

What changes were proposed in this pull request?

  1. For the interval arithmetic functions, e.g. add/subtract/negative/multiply/divide, enable overflow check when ANSI is on.

  2. For multiply/divide, throw an exception when an overflow happens in spite of ANSI is on/off.

  3. add/subtract/negative stay the same for backward compatibility.

  4. divide by 0 throws ArithmeticException whether ANSI or not as same as numerics.

  5. These behaviors fit the numeric type operations fully when ANSI is on.

  6. These behaviors fit the numeric type operations fully when ANSI is off, except 2 and 4.

Why are the changes needed?

  1. bug fix
  2. ANSI support

Does this PR introduce any user-facing change?

When ANSI is on, interval add/subtract/negative/multiply/divide will overflow if any field overflows

How was this patch tested?

add unit tests

@yaooqinn
Copy link
Member Author

cc: @cloud-fan @maropu @HyukjinKwon, thanks very much for reviewing

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115685 has finished for PR 26995 at commit 829cfe7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Dec 24, 2019

Why did you target interval only in this pr? If we change the overflow behaivour for arighmetic operatrions, I think we need to carefully handle it cuz it has impacts on the existing behaivours.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 24, 2019

Why did you target interval only in this pr? If we change the overflow behaivour for arighmetic operatrions, I think we need to carefully handle it cuz it has impacts on the existing behaivours.

Because the numeric types are handled in arithmetics properly in both ANSI or non-ANSI mode based on my check. This PR tries to fix the interval to follow what the numeric ones do. Thus, the behavior impacts are limited to the interval operations as the pr description says.

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115690 has finished for PR 26995 at commit 516080c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Dec 24, 2019

We cannot check it outside BinaryArithmetic like numeric decimal CheckOverflow? It seems this pr includes the same try-catch patterns.

@yaooqinn
Copy link
Member Author

CheckOverflow is for decimal only, not all numerics

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115692 has finished for PR 26995 at commit 67767c0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -37,6 +37,11 @@ case class UnaryMinus(child: Expression) extends UnaryExpression
with ExpectsInputTypes with NullIntolerant {
private val checkOverflow = SQLConf.get.ansiEnabled

override def nullable: Boolean = dataType match {
case CalendarIntervalType if !checkOverflow => true
case _ => super.nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: child.nullable is more clear here.

@@ -218,6 +252,11 @@ object BinaryArithmetic {
""")
case class Add(left: Expression, right: Expression) extends BinaryArithmetic {

override def nullable: Boolean = dataType match {
case CalendarIntervalType if !checkOverflow => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make the overflow behavior consistent? e.g. other numeric types follow the java overflow behavior and interval returns null for overflow, which is inconsistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current behavior of master is separating a) decimal(which is null for overflow) from b) other types. This pr (so far)is just adding intervals to group a). We may reach an agreement on which way to follow first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in spark 2.4, do we have internal arithmetic operations? The non-ANSI behavior should follow the old behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have.
we now have +/-/unaray_- in java overflow behavior(2.4 or maybe earlier)
and we have '*/-' in null for overflow behavior (3.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's be consistent and follow java overflow behavior when ansi is false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark-sql> select cast('128' as tinyint);
NULL
Time taken: 0.029 seconds, Fetched 1 row(s)
spark-sql> select cast(128 as tinyint);
-128

not quite related to this pr, the cast logic seems unconsisent too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast from string and cast from int are different and don't need to be consistent. But interval +- and interval */ should be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed behavior to a) ANSI on: exception, b) ANSI off: java style overflow

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115703 has finished for PR 26995 at commit 7293377.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115735 has finished for PR 26995 at commit f100d88.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class IntervalNumOperation(interval: Expression, num: Expression)

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115737 has finished for PR 26995 at commit ba44c5a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class DivideInterval(

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115740 has finished for PR 26995 at commit 67645d4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class MultiplyInterval(interval: Expression, num: Expression)
  • case class DivideInterval(interval: Expression, num: Expression)

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115746 has finished for PR 26995 at commit 7671d83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115847 has finished for PR 26995 at commit b679381.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 2, 2020

Test build #116013 has finished for PR 26995 at commit aba10b2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class MultiplyInterval(interval: Expression, num: Expression)
  • case class DivideInterval(interval: Expression, num: Expression)

@@ -81,7 +80,8 @@ case class Average(child: Expression) extends DeclarativeAggregate with Implicit
case _: DecimalType =>
DecimalPrecision.decimalAndDecimal(sum / count.cast(DecimalType.LongDecimal)).cast(resultType)
case CalendarIntervalType =>
DivideInterval(sum.cast(resultType), count.cast(DoubleType))
val newCount = If(EqualTo(count, Literal(0L)), Literal(null, LongType), count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avg(interval) is also new in 3.0 right? We can also fail here if this is the SQL standard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked pgsql, avg on empty table returns null. So this is corrected.

MultiplyInterval(Literal(stringToInterval(interval)), Literal(num)), expected)
} else {
checkEvaluation(MultiplyInterval(Literal(stringToInterval(interval)), Literal(num)),
if (expected == null) null else stringToInterval(expected))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can expected be null?

withSQLConf(SQLConf.ANSI_ENABLED.key -> v) {
if (checkException) {
checkExceptionInExpression[ArithmeticException](
MultiplyInterval(Literal(stringToInterval(interval)), Literal(num)), expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a val expr = MultiplyInterval(Literal(stringToInterval(interval)), Literal(num)) at beginning, to save code duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can also use safeStringToInterval(expected) == null to choose way to check too

withSQLConf(SQLConf.ANSI_ENABLED.key -> v) {
if (checkException) {
checkExceptionInExpression[ArithmeticException](
DivideInterval(Literal(stringToInterval(interval)), Literal(num)), expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SparkQA
Copy link

SparkQA commented Jan 2, 2020

Test build #116021 has finished for PR 26995 at commit 988b51c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 2, 2020

Test build #116023 has finished for PR 26995 at commit f80f0f3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Jan 2, 2020

retest this please

@cloud-fan
Copy link
Contributor

cloud-fan commented Jan 2, 2020

seems multiple PRs start to fail org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite. cc @HeartSaVioR

@HeartSaVioR
Copy link
Contributor

@cloud-fan Thanks for pinging me. I'll take a look at these failures.

@SparkQA
Copy link

SparkQA commented Jan 2, 2020

Test build #116038 has finished for PR 26995 at commit f80f0f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e04309c Jan 2, 2020
@HeartSaVioR
Copy link
Contributor

Just FYI I spent my time a bit to deal with org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite, but no luck. I'd like to ask @gaborgsomogyi to help resolving the issue, as he authored the test and had investigated before for similar issue.

Hi @gaborgsomogyi , welcome back! Would you mind to try dealing with this flaky test? Thanks in advance!

@gaborgsomogyi
Copy link
Contributor

@HeartSaVioR Thanks for pinging. I've had a slight look and seems like it may caused by kafka-clients 2.4 upgrade. Instead of ZkUtils now KafkaZkClient used to call getAllBrokersInCluster which is causing the issue:

Caused by: sbt.ForkMain$ForkError: org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /brokers/ids
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
	at kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:554)
	at kafka.zk.KafkaZkClient.getChildren(KafkaZkClient.scala:719)
	at kafka.zk.KafkaZkClient.getSortedBrokerList(KafkaZkClient.scala:455)
	at kafka.zk.KafkaZkClient.getAllBrokersInCluster(KafkaZkClient.scala:404)
	at org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$setup$3(KafkaTestUtils.scala:293)
	at org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395)
	at org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:409)
	... 20 more

My first guess is that the new API behaves somehow different and/or it may contain some bugs.

@gaborgsomogyi
Copy link
Contributor

Going to speak with the Kafka guys what they think about this...

@yaooqinn
Copy link
Member Author

yaooqinn commented Jan 8, 2020

Found ticket for zkclient/localhost@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COM expiring on Fri Jan 03 04:30:09 PST 2020
Service ticket not found in the subject
KrbException: Server not found in Kerberos database (7) - Server not found in Kerberos database
	at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
	at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
	at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
	at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
	at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
	at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
	at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:693)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
	at org.apache.zookeeper.client.ZooKeeperSaslClient$1.run(ZooKeeperSaslClient.java:323)
	at org.apache.zookeeper.client.ZooKeeperSaslClient$1.run(ZooKeeperSaslClient.java:320)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:320)
	at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:305)
	at org.apache.zookeeper.client.ZooKeeperSaslClient.sendSaslPacket(ZooKeeperSaslClient.java:377)
	at org.apache.zookeeper.client.ZooKeeperSaslClient.initialize(ZooKeeperSaslClient.java:415)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1149)
Caused by: KrbException: Identifier doesn't match expected value (906)
	at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
	at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
	at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
	at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
	... 18 more

Hi @gaborgsomogyi you might need look at this exception which is throwed right before the AuthFailedException

@gaborgsomogyi
Copy link
Contributor

gaborgsomogyi commented Jan 8, 2020

@yaooqinn thanks for sharing. Yep, maybe the kerberos principal is wrong again. Not sure how this can be flaky though since it's hardcoded...

The mentioned AuthFailed is just the end-result of the KrbException.

@yaooqinn
Copy link
Member Author

yaooqinn commented Jan 8, 2020

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116249/console here is another flaky failure, maybe related to Jenkins env, e.g. DNS, I'm not sure

@gatorsmile
Copy link
Member

When ANSI is on, interval add/subtract/negative/multiply/divide will overflow if any field overflows. However, when ANSI is off and overflow happens, why multiply/divide will throw an exception but add/subtract/negative will not throw an exception? This looks inconsistent.

struct<>
-- !query 117 output
java.lang.ArithmeticException
integer overflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change it and make them consistent like what you discussed above #26995 (comment)?

Copy link
Member Author

@yaooqinn yaooqinn Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with your suggestion,but first let us also cc @cloud-fan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a 3.0 blocker JIRA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC a problem is: the new operators in 3.0 (interval * double and interval / double) do not have a reasonable java style overflow behavior.

The actual operation is (int * double).toInt, and Double.PositiveInfinity.toInt returns Int.Max, which can be confusing.

Since ansi mode is off by default, we are introducing a new behavior that is non-ansi, which is weird. There are 2 options:

  1. interval * double and interval / double return max or min int/long value when overflow. interval / 0 returns null.
  2. revert these 2 operators.

@gatorsmile what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the option 2 looks good to me.

@gatorsmile
Copy link
Member

Compared with the previous release, any behavior change we made in this PR?

@yaooqinn
Copy link
Member Author

Compared with the previous release, any behavior change we made in this PR?

When ANSI is on, interval add/subtract/negative will check overflow compared with 2.4

@gatorsmile
Copy link
Member

Since the mode ANSI is introduced in Spark 3.0 and off by default, this is not a behavior change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants