Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11970] [SQL] Adding JoinType into JoinWith and support Sample in Dataset API #9921

Closed
wants to merge 6 commits into from

Conversation

gatorsmile
Copy link
Member

Except inner join, maybe the other join types are also useful when users are using the joinWith function. Thus, added the joinType into the existing joinWith call in Dataset APIs.

Also providing another joinWith interface for the cartesian-join-like functionality.

Please provide your opinions. @marmbrus @rxin @cloud-fan Thank you!

@SparkQA
Copy link

SparkQA commented Nov 24, 2015

Test build #46562 has finished for PR 9921 at commit 50f5320.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -412,7 +418,7 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
}


case class ClassData(a: String, b: Int)
case class ClassData(a: String, b: Integer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change it to Integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to use null in the newly introduced test case. Int does not support null. Or do you want me to add a new case class and keep the existing one untouched? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some other tests may depend on the fact that ClassData.b is not nullable, we'd better not break it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Let me do a quick change. Thanks!

@SparkQA
Copy link

SparkQA commented Nov 24, 2015

Test build #46585 has finished for PR 9921 at commit 980b153.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 24, 2015

@gatorsmile I'm going to take a look at all the functions tomorrow and come up with a list of missing things that we should add. Can you update your pull request then?

@gatorsmile
Copy link
Member Author

@rxin Sure. Please let me know if anything I can help. Thank you!

@rxin
Copy link
Contributor

rxin commented Nov 25, 2015

OK filed: https://issues.apache.org/jira/browse/SPARK-11970

Can you merge the sample change into this one, and also add "show" support?

@@ -17,6 +17,8 @@

package org.apache.spark.sql

import org.apache.spark.sql.functions._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort the imports properly

@gatorsmile
Copy link
Member Author

Sure, Will do it!

Thank you very much!

@gatorsmile
Copy link
Member Author

@rxin Could you check the latest code changes? Are they resolving all your comments?

Will do the merge when both PR are ok. Thank you!

@gatorsmile gatorsmile changed the title [SPARK-11934] [SQL] Adding joinType into joinWith [SPARK-11970] [SQL] Adding JoinType into JoinWith and support Sample and Show in Dataset API Nov 25, 2015
@@ -453,6 +451,22 @@ class Dataset[T] private[sql](
c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)] =
selectUntyped(c1, c2, c3, c4, c5).asInstanceOf[Dataset[(U1, U2, U3, U4, U5)]]


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the extra blank line here

@rxin
Copy link
Contributor

rxin commented Nov 25, 2015

@gatorsmile can you update the title to remove "show"? Just keep sample and join.

@gatorsmile gatorsmile changed the title [SPARK-11970] [SQL] Adding JoinType into JoinWith and support Sample and Show in Dataset API [SPARK-11970] [SQL] Adding JoinType into JoinWith and support Sample in Dataset API Nov 25, 2015
test("joinWith, expression condition, outer join") {
val nullInteger = null.asInstanceOf[Integer]
val nullString = null.asInstanceOf[String]
val ds1 = Seq(ClassNullableData("a", new Integer(1)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can just pass in 1, and compile will auto box for us.

@cloud-fan
Copy link
Contributor

LGTM except some minor style comments.

@gatorsmile
Copy link
Member Author

@rxin @cloud-fan Just combined all the changes you mentioned in the comments. Thank you for your inputs! : )

@SparkQA
Copy link

SparkQA commented Nov 25, 2015

Test build #46665 has finished for PR 9921 at commit 0d62b5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2015

Test build #46676 has finished for PR 9921 at commit 458cf67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 25, 2015

Thanks - merging this in.

@asfgit asfgit closed this in 2610e06 Nov 25, 2015
asfgit pushed a commit that referenced this pull request Nov 25, 2015
…n Dataset API

Except inner join, maybe the other join types are also useful when users are using the joinWith function. Thus, added the joinType into the existing joinWith call in Dataset APIs.

Also providing another joinWith interface for the cartesian-join-like functionality.

Please provide your opinions. marmbrus rxin cloud-fan Thank you!

Author: gatorsmile <gatorsmile@gmail.com>

Closes #9921 from gatorsmile/joinWith.

(cherry picked from commit 2610e06)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants