-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11970] [SQL] Adding JoinType into JoinWith and support Sample in Dataset API #9921
Conversation
Test build #46562 has finished for PR 9921 at commit
|
@@ -412,7 +418,7 @@ class DatasetSuite extends QueryTest with SharedSQLContext { | |||
} | |||
|
|||
|
|||
case class ClassData(a: String, b: Int) | |||
case class ClassData(a: String, b: Integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change it to Integer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to use null in the newly introduced test case. Int does not support null. Or do you want me to add a new case class and keep the existing one untouched? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some other tests may depend on the fact that ClassData.b
is not nullable, we'd better not break it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Let me do a quick change. Thanks!
Test build #46585 has finished for PR 9921 at commit
|
@gatorsmile I'm going to take a look at all the functions tomorrow and come up with a list of missing things that we should add. Can you update your pull request then? |
@rxin Sure. Please let me know if anything I can help. Thank you! |
OK filed: https://issues.apache.org/jira/browse/SPARK-11970 Can you merge the sample change into this one, and also add "show" support? |
@@ -17,6 +17,8 @@ | |||
|
|||
package org.apache.spark.sql | |||
|
|||
import org.apache.spark.sql.functions._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort the imports properly
Sure, Will do it! Thank you very much! |
@rxin Could you check the latest code changes? Are they resolving all your comments? Will do the merge when both PR are ok. Thank you! |
@@ -453,6 +451,22 @@ class Dataset[T] private[sql]( | |||
c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)] = | |||
selectUntyped(c1, c2, c3, c4, c5).asInstanceOf[Dataset[(U1, U2, U3, U4, U5)]] | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the extra blank line here
@gatorsmile can you update the title to remove "show"? Just keep sample and join. |
test("joinWith, expression condition, outer join") { | ||
val nullInteger = null.asInstanceOf[Integer] | ||
val nullString = null.asInstanceOf[String] | ||
val ds1 = Seq(ClassNullableData("a", new Integer(1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can just pass in 1
, and compile will auto box for us.
LGTM except some minor style comments. |
@rxin @cloud-fan Just combined all the changes you mentioned in the comments. Thank you for your inputs! : ) |
Test build #46665 has finished for PR 9921 at commit
|
Test build #46676 has finished for PR 9921 at commit
|
Thanks - merging this in. |
…n Dataset API Except inner join, maybe the other join types are also useful when users are using the joinWith function. Thus, added the joinType into the existing joinWith call in Dataset APIs. Also providing another joinWith interface for the cartesian-join-like functionality. Please provide your opinions. marmbrus rxin cloud-fan Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #9921 from gatorsmile/joinWith. (cherry picked from commit 2610e06) Signed-off-by: Reynold Xin <rxin@databricks.com>
Except inner join, maybe the other join types are also useful when users are using the joinWith function. Thus, added the joinType into the existing joinWith call in Dataset APIs.
Also providing another joinWith interface for the cartesian-join-like functionality.
Please provide your opinions. @marmbrus @rxin @cloud-fan Thank you!