-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22016][SQL] Add HiveDialect for JDBC connection to Hive #19238
Conversation
Can one of the admins verify this patch? |
Why not directly connecting to Hive metastore? |
@gatorsmile if Hive lies on the same infrastructure as the application, then the metastore should definitely solve the issue, but a connection over JDBC is needed when data comes from an external source which only exposes such a connection through its Hive server. We encountered this and ended up adding the HiveDialect to solve it. |
assert(df3.collect() === Array(Row(21519, 1234))) | ||
} | ||
assert(df3.collect() === Array(Row(21519, 1234)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ')' is wrong. Line 1105~1107 from the original have indentation issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must have changed when formatting the code using the IDE. Scalastyle checks passed though, but let me rollback that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun done! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, actually, I meant the original Spark code is also wrong in terms of indentation. You can fix the indentation of original line 1105~1107 here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun You are right! I misread the parenthesis. I think now is correct. Thank you for the observation :)
I can see the value, but it does not perform well in most cases if we using JDBC connection. Instead of adding the extra dialect to upstream, could you please add Hive as a separate data source? Thanks! |
Seems logical. Then, unless someone disagrees, feel free to close this PR and we will create a new spark package with this feature in a new repository. Thanks! |
This merge request would partly solve https://issues.apache.org/jira/browse/SPARK-21063 |
What changes were proposed in this pull request?
Added a HiveDialect for JDBC connection to Hive.
It overrides two methods:
How was this patch tested?
It passes the added tests and it was used with a real Hive instance with real data.