Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GpuInsertIntoHiveTable supports parquet format #10912

Merged
merged 9 commits into from
May 31, 2024

Conversation

firestarman
Copy link
Collaborator

close #9939

This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman changed the title GpuInsertIntoHiveTable supports parquet GpuInsertIntoHiveTable supports parquet format May 28, 2024
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

val storage = insertCmd.table.storage
// Configs check for Parquet write enabling/disabling

// FIXME Need to check serde and output format classes ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment out of date, or is there more to do here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Removed

s"as an int or a long")
}

// FIXME Need a new format type for Hive Parquet write ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a new format type is required, but I could see it done that way if desired. Comment needs to be addressed in some way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx a lot for this info. Removed

firestarman and others added 4 commits May 29, 2024 10:41
…HiveFileFormat.scala

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
…HiveFileFormat.scala

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Comment on lines 57 to 58
# ProjectExec falls back on databricks due to a new expression named "MapFromArrays".
fallback_nodes = ['ProjectExec'] if is_databricks_runtime() else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it understood why MapFromArrays is appearing here? Note that MapFromArrays is not "new" in the sense that it's been in Apache Spark since Spark 2.4. The concern is that this test is allowing a fallback when we're not testing for a fallback.

Do we have confidence this won't appear in a normal query? I suspect it's an artifact of how map generation works from Python, but then I wonder why we're not needing to fallback on MapFromArrays in other tests that generate maps.

Copy link
Collaborator Author

@firestarman firestarman May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the details on MapFromArrays yet. And file an issue for this. Here it is #10948

@sameerz sameerz added the feature request New feature or request label May 29, 2024
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

1 similar comment
@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman merged commit 4024ef6 into NVIDIA:branch-24.08 May 31, 2024
42 of 44 checks passed
@firestarman firestarman deleted the hive-parquet branch May 31, 2024 01:47
SurajAralihalli pushed a commit to SurajAralihalli/spark-rapids that referenced this pull request Jul 12, 2024
This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR.

---------

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants