-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3343] [SQL] Add serde support for CTAS #2570
Conversation
QA tests have started for PR 2570 at commit
|
QA tests have finished for PR 2570 at commit
|
Test PASSed. |
QA tests have started for PR 2570 at commit
|
QA tests have finished for PR 2570 at commit
|
Test PASSed. |
| STORED AS RCFile AS | ||
| SELECT key, value | ||
| FROM src | ||
| ORDER BY key, value""".stripMargin).collect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should just check the contents of created tables to test whether we can correctly set some table properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question, I will add the describe
command to verify if the properties are correctly set.
Test FAILed. |
retest this please. |
QA tests have started for PR 2570 at commit
|
insertIntoRelation(metastoreRelation).execute | ||
// TODO ideally, we should get the output data ready first and then | ||
// update the relation, just in case of failure occurs in data | ||
// processing. Otherwise we may not able to get a consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we populate metastore after evaluating the query, we also need to make sure information stored in CreateTableDesc
will be correctly set to tableInfo
in the FileSinkDesc
. Also, if TableDesc org.apache.hadoop.hive.ql.plan.PlanUtils.getTableDesc(CreateTableDesc, String, String)
will be used to create the tableInfo
, the implementation of this method in hive 0.12 cannot be used because of the bug mentioned in https://issues.apache.org/jira/browse/HIVE-6083. Can you add a note at here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks, I didn't know this, will do that.
QA tests have finished for PR 2570 at commit
|
Test PASSed. |
tblName: String, | ||
crtTbl: CreateTableDesc, | ||
schema: Seq[Attribute]) { | ||
// Most of code are similar with the DDLTask.createTable(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be good to mention DDLTask
is in Hive's codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consolidate two createTable
? With your change, seems the original createTable
will be only used by createTable
in HiveContext
.
Also, can you add comments to HiveContext.createTable
to explain what table properties will be used if users call this method?
Thank you @yhuai , I will update the code accordingly. |
fcbbc61
to
d49596b
Compare
QA tests have started for PR 2570 at commit
|
QA tests have finished for PR 2570 at commit
|
Test PASSed. |
Test FAILed. |
da02ec3
to
ff2e140
Compare
QA tests have started for PR 2570 at commit
|
QA tests have finished for PR 2570 at commit
|
Test PASSed. |
override def output = child.output | ||
child: LogicalPlan, | ||
allowExisting: Boolean, | ||
extra: AnyRef = null) extends UnaryNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making this an AnyRef
maybe we can just move it into the hive package? Either that or create a specialized hive version of this operator if we use it elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about to make the extra as generic type? CTAS
probably widely supported by different SQL dialects, creating specialized version maybe lead to duplicated code.
Test build #22283 has started for PR 2570 at commit
|
@marmbrus, I've rebase dthe code with latest master (with Hive 0.13.1 supported, but not compatible with Hive 0.12). Please let me know if you have concerns on this. |
Test build #22283 has finished for PR 2570 at commit
|
Test FAILed. |
Build failed because this PR is only compatible with Hive 0,13.1 (not 0.12 any more). |
Can you explain the reason that hive 0.12 is not supported? |
Test build #22317 has started for PR 2570 at commit
|
Test build #22317 has finished for PR 2570 at commit
|
Test FAILed. |
@yhuai Some of the methods signature changed after upgrading to Hive 0.13, this actually is my concerns for how to write the shim code. |
Is it possible to add a method in shim like the following one?
|
Support conditional compilation probably need some workaround here, that's a general problem for the Hive upgrading I think. We need another PR to solve that before merging the PRs like this one. |
@chenghao-intel there is already support for conditional compilation based on hive version. This code can go in |
Thanks @marmbrus I will update the code. |
2ab88c3
to
e011ef5
Compare
Test build #22337 has started for PR 2570 at commit
|
Test build #22337 has finished for PR 2570 at commit
|
Test PASSed. |
Thanks for working on this! I'm going to merge it in, but we should consider moving the semantic analyzer part to the analysis phase. These execution APIs are all developer / experimental so we can change them whenever . |
This is the code refactor and follow ups for #2570 Author: Cheng Hao <hao.cheng@intel.com> Closes #3336 from chenghao-intel/createtbl and squashes the following commits: 3563142 [Cheng Hao] remove the unused variable e215187 [Cheng Hao] eliminate the compiling warning 4f97f14 [Cheng Hao] fix bug in unittest 5d58812 [Cheng Hao] revert the API changes b85b620 [Cheng Hao] fix the regression of temp tabl not found in CTAS (cherry picked from commit 51b1fe1) Signed-off-by: Michael Armbrust <michael@databricks.com>
This is the code refactor and follow ups for #2570 Author: Cheng Hao <hao.cheng@intel.com> Closes #3336 from chenghao-intel/createtbl and squashes the following commits: 3563142 [Cheng Hao] remove the unused variable e215187 [Cheng Hao] eliminate the compiling warning 4f97f14 [Cheng Hao] fix bug in unittest 5d58812 [Cheng Hao] revert the API changes b85b620 [Cheng Hao] fix the regression of temp tabl not found in CTAS
Currently,
CTAS
(Create Table As Select) doesn't support specifying theSerDe
in HQL. This PR will pass down theASTNode
into the physical operatorexecution.CreateTableAsSelect
, which will extract theCreateTableDesc
object via HiveSemanticAnalyzer
. In the meantime, I also update theHiveMetastoreCatalog.createTable
to optionally support theCreateTableDesc
for table creation.