-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support HDFS and S3 path as warehouse directory for Hive file catalog #24660
Support HDFS and S3 path as warehouse directory for Hive file catalog #24660
Conversation
6bca326
to
f49a0f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc update! Nits for readability suggested.
f49a0f0
to
f3b07ae
Compare
Thank you @steveburnett for the suggestion, fixed! Please take a look when available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
@@ -1617,6 +1625,25 @@ public void testRegisterTableWithFileName() | |||
dropTable(getSession(), tableName); | |||
} | |||
|
|||
protected HdfsEnvironment getHdfsEnvironment() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Ideally this should be after all public methods, but it was not fully complying before either. We can leave it as is now but it will be nice to move it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it has been moved to the bottom.
return IcebergDistributedTestBase.getHdfsEnvironment(hiveClientConfig, metastoreClientConfig, hiveS3Config); | ||
} | ||
|
||
public static String getMetadataFileLocation(ConnectorSession session, HdfsEnvironment hdfsEnvironment, String schema, String table, String metadataLocation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be private? It's only used above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out this, fixed! And I have moved this method to the bottom as well since it has been modified to private
. Please take a look when convenient.
f3b07ae
to
48a7b6d
Compare
Description
Hive file metastore is a file-based metastore used for testing or development purpose whose document is introduced in PR #24511. This PR enable File-based hive metastore to use HDFS/S3 locations as warehouse dir, as described in issue #19112.
An example configuration to use HDFS path includes:
Besides, by configuring the s3 properties described in https://prestodb.io/docs/current/connector/hive.html#amazon-s3-configuration, we can specify a S3 location as the warehouse dir of Hive file catalog. This way, both metadata and data
of iceberg tables will be maintained on S3 storage.
An example configuration to use S3 path includes:
Motivation and Context
Support specifying a HDFS/S3 location directly as warehouse dir for file-based hive metastore
Impact
Lake houses configured with hive file metastore can now specify a HDFS/S3 location directly as the warehouse dir
Test Plan
IcebergDistributedTestBase
,IcebergDistributedSmokeTestBase
andTestIcebergDistributedQueries
on Iceberg connector configured with Hive file catalog on local deployed MINIO object storageIcebergDistributedTestBase
,IcebergDistributedSmokeTestBase
andTestIcebergDistributedQueries
on Iceberg connector configured with Hive file catalog on local deployed HDFSContributor checklist
Release Notes