Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive insert overwrite when using S3 storage #17307

Merged
merged 1 commit into from
Mar 31, 2022

Conversation

nmahadevuni
Copy link
Member

@nmahadevuni nmahadevuni commented Feb 16, 2022

Cherry-pick of trinodb/trino#9234

Co-authored-by: Arkadiusz Czajkowski arek@starburstdata.com

This PR fixes insert overwrite operation on S3 backing storage when below property is set

  • hive.insert-existing-partitions-behavior=OVERWRITE

Applied fix writes to partition target directory directly. After successfully writing new files coordinator deletes all files within partition whose name prefix or suffix didn't match current query ID

Test plan - Test containers for HMS and MinIO allowing to test and debug S3 related issue easily from IDE
based on above local dockerized S3 data lake

== RELEASE NOTES ==

Hive Changes

*  Support for overwriting existing partitions with a hive config property `hive.insert-existing-partitions-behavior`. The config supersedes a legacy config `hive.insert-overwrite-immutable-partitions-enabled`. The new config adds capability of overwriting new partitions for S3.

@nmahadevuni nmahadevuni changed the title Hive insert overwrite when using S3 storage WIP: Hive insert overwrite when using S3 storage Feb 16, 2022
@nmahadevuni nmahadevuni force-pushed the s3overwritepartition branch 4 times, most recently from 2e5a6aa to be04a09 Compare February 18, 2022 11:06
@nmahadevuni
Copy link
Member Author

@jainxrohit Can you please review this?

@nmahadevuni nmahadevuni changed the title WIP: Hive insert overwrite when using S3 storage Hive insert overwrite when using S3 storage Feb 18, 2022
@NikhilCollooru
Copy link
Contributor

NikhilCollooru commented Mar 14, 2022

I think the problem is that S3 is not deleting the files of a partition after its dropped. Both this PR and #17369 are trying to solve the same problems in different ways.

I feel #17369 is the better approach since it fixes the drop partition for S3.

@nmahadevuni
Copy link
Member Author

I think the problem is that S3 is not deleting the files of a partition after its dropped. Both this PR and #17369 are trying to solve the same problems in different ways.

I feel #17369 is the better approach since it fixes the drop partition for S3.

@NikhilCollooru : Sorry, both these PRs are not the solving same problem. This PR is to support INSERT(overwrite) into existing partition on S3. It doesn't fix anything related to drop partition.

@NikhilCollooru
Copy link
Contributor

I think the problem is that S3 is not deleting the files of a partition after its dropped. Both this PR and #17369 are trying to solve the same problems in different ways.
I feel #17369 is the better approach since it fixes the drop partition for S3.

@NikhilCollooru : Sorry, both these PRs are not the solving same problem. This PR is to support INSERT(overwrite) into existing partition on S3. It doesn't fix anything related to drop partition.

Okay . I see it now. This PR is about retaining the directory location/path when the partition is overwritten.

can you briefly explain what the other commits are ?

@nmahadevuni
Copy link
Member Author

The other commits add test infrastructure to run docker based tests using docker java API. The Minio container is used as S3 substitute and HiveHadoop container to run hadoop/hive servers.

Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NikhilCollooru, feel free to review the PR. Once you feel it's in a good shape, I can help to take a look.

@nmahadevuni nmahadevuni force-pushed the s3overwritepartition branch 3 times, most recently from 5e8ccf9 to 99a028b Compare March 21, 2022 10:08
@nmahadevuni
Copy link
Member Author

Thanks @NikhilCollooru for the review. I addressed the comments.

Copy link
Contributor

@NikhilCollooru NikhilCollooru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NikhilCollooru NikhilCollooru requested a review from highker March 21, 2022 15:49
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first 4 commits LGTM; still reviewing

public abstract class BaseTestContainer
implements AutoCloseable
{
private final Logger log;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a static class with direct assignment from Logger.get(BaseTestContainer.getClass());

private final Optional<Network> network;
private final int startupRetryLimit;

private GenericContainer<?> container;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final


import static java.lang.String.format;

public class Minio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does minio stand for? I guess it's MinIOContainer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in containers subpackage. Should we rename it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya sure

import java.util.Optional;
import java.util.Set;

public class HiveHadoop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveHadoopContainer?

import static java.util.Objects.requireNonNull;
import static org.testcontainers.containers.Network.newNetwork;

public class HiveMinioDataLake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinIO

Copy link
Member Author

@nmahadevuni nmahadevuni Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking to change to name to HiveMinIODataLake?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 44 to 47
import static com.facebook.presto.hive.HiveSessionProperties.InsertExistingPartitionsBehavior;
import static com.facebook.presto.hive.HiveSessionProperties.InsertExistingPartitionsBehavior.APPEND;
import static com.facebook.presto.hive.HiveSessionProperties.InsertExistingPartitionsBehavior.ERROR;
import static com.facebook.presto.hive.HiveSessionProperties.InsertExistingPartitionsBehavior.OVERWRITE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move InsertExistingPartitionsBehavior to this class so that hive config doesn't depend on hive session properties.

@@ -93,6 +98,7 @@
private boolean createEmptyBucketFiles = true;
private boolean insertOverwriteImmutablePartitions;
private boolean failFastOnInsertIntoImmutablePartitionsEnabled = true;
private Optional<InsertExistingPartitionsBehavior> insertExistingPartitionsBehavior = Optional.empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's weird to have optional here. Can we infer the default through getDefaultInsertExistingPartitionsBehavior? Also, I guess our goal here is to deprecate isInsertOverwriteImmutablePartitionEnabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isInsertOverwriteImmutablePartitionEnabled is still a valid config. In the HiveSessionProperties, we set the default by calling getInsertExistingPartitionsBehavior() which is same as getDefaultInsertExistingPartitionsBehavior() except getInsertExistingPartitionsBehavior also considers the explicit value if any set by the user for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was once we have insertExistingPartitionsBehavior as an option in the config, there is no need to have isInsertOverwriteImmutablePartitionEnabled anymore. Having the overlapping configs will confuse the users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can infer the default value for this through getDefaultInsertExistingPartitionsBehavior(). If we do, and immutable-partitions is set to true later, we will return APPEND which is not valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then can we make it non optional? Having an optional here makes the config ambiguous; check my other comment on fail hard

Comment on lines -1063 to -1071
private static InsertExistingPartitionsBehavior getDefaultInsertExistingPartitionsBehavior(HiveClientConfig hiveClientConfig)
{
if (!hiveClientConfig.isImmutablePartitions()) {
return APPEND;
}

return hiveClientConfig.isInsertOverwriteImmutablePartitionEnabled() ? OVERWRITE : ERROR;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is still valuable with some part of it. Especially when we specify APPEND on immutable partitions (which should throw an error)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the default case, the method getInsertExistingPartitionsBehavior() is same as this method. We will still throw error for APPEND on immutable partitions. There is no change in that.

@@ -2091,6 +2125,27 @@ else if (partitionUpdate.getUpdateMode() == NEW || partitionUpdate.getUpdateMode
.collect(toList())));
}

private void removeNonCurrentQueryFiles(ConnectorSession session, Path partitionPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a javadoc to explain why we need to clean up files here?

Comment on lines 2088 to 2090
if (handle.getLocationHandle().getWriteMode() == DIRECT_TO_TARGET_EXISTING_DIRECTORY) {
removeNonCurrentQueryFiles(session, partitionUpdate.getTargetPath());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also explain a bit more of this branch? We are going to overwrite a partition, but this seems to be deleting the files.

Comment on lines 2093 to 2109
metastore.addPartition(
session,
handle.getSchemaName(),
handle.getTableName(),
table.getStorage().getLocation(),
false,
partition,
partitionUpdate.getWritePath(),
partitionStatistics);
}
}
else { // New partition
metastore.addPartition(
session,
handle.getSchemaName(),
handle.getTableName(),
table.getStorage().getLocation(),
false,
partition,
partitionUpdate.getWritePath(),
partitionStatistics);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to simplify the logic as they are duplicated code. The old logic is a good example.

@nmahadevuni nmahadevuni force-pushed the s3overwritepartition branch from 99a028b to dce9a5e Compare March 24, 2022 11:55
@nmahadevuni
Copy link
Member Author

Thanks @highker . Addressed all the comments. Please review.

@highker highker self-requested a review March 24, 2022 18:17
@highker
Copy link
Contributor

highker commented Mar 24, 2022

@nmahadevuni, can you separate out the first 4 commits into a separate PR? Those are good and we can merge faster. For the last one, I might have more comments.

@highker
Copy link
Contributor

highker commented Mar 27, 2022

rebase maybe?

@nmahadevuni nmahadevuni force-pushed the s3overwritepartition branch from dce9a5e to dd5e9ab Compare March 27, 2022 18:23
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits only

@@ -53,6 +61,24 @@
"hive.optimized-reader.enabled"})
public class HiveClientConfig
{
public enum InsertExistingPartitionsBehavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it closer to the setter and getter like how other enums are defined in the class?

@@ -93,6 +98,7 @@
private boolean createEmptyBucketFiles = true;
private boolean insertOverwriteImmutablePartitions;
private boolean failFastOnInsertIntoImmutablePartitionsEnabled = true;
private Optional<InsertExistingPartitionsBehavior> insertExistingPartitionsBehavior = Optional.empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then can we make it non optional? Having an optional here makes the config ambiguous; check my other comment on fail hard

@@ -604,6 +631,19 @@ public HiveClientConfig setInsertOverwriteImmutablePartitionEnabled(boolean inse
return this;
}

public InsertExistingPartitionsBehavior getInsertExistingPartitionsBehavior()
{
return insertExistingPartitionsBehavior.orElse(immutablePartitions ? (isInsertOverwriteImmutablePartitionEnabled() ? OVERWRITE : ERROR) : APPEND);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fail hard if the two configs do not match with each other instead of relying on optional empty

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these configs don't match, we will throw error at line 628 checkArgument call. I have removed optional though.

@@ -604,6 +631,19 @@ public HiveClientConfig setInsertOverwriteImmutablePartitionEnabled(boolean inse
return this;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let make isInsertOverwriteImmutablePartitionEnabled a LegacyConfig

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this LegacyConfig, should property be same as @config? Like below ? Since it is not overridden by any new property.
@LegacyConfig("hive.insert-overwrite-immutable-partitions-enabled")
@config("hive.insert-overwrite-immutable-partitions-enabled")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad; actually i meant having a @Deprecated annotation for it.

partitionStatistics);

// New partition or overwriting existing partition by staging and moving the new partition
if (!existingPartition || (existingPartition && handle.getLocationHandle().getWriteMode() != DIRECT_TO_TARGET_EXISTING_DIRECTORY)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bug? You mean

if (!existingPartition || partitionUpdate.getUpdateMode() == OVERWRITE)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If partitionUpdate.getUpdateMode() != OVERWRITE, then we will throw error on line 2089. If we just check for (partitionUpdate.getUpdateMode() == OVERWRITE) here, even when writeMode is DIRECT_TO_TARGET_EXISTING_DIRECTORY, we will add partition, which is wrong. Since for DIRECT_TO_TARGET_EXISTING_DIRECTORY, the partition already exists and we are removing old files. I simplified the condition to if (!existingPartition || handle.getLocationHandle().getWriteMode() != DIRECT_TO_TARGET_EXISTING_DIRECTORY)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, we can still simlify the condition to be

if (!existingPartition || handle.getLocationHandle().getWriteMode() != DIRECT_TO_TARGET_EXISTING_DIRECTORY)

?

@highker highker self-assigned this Mar 27, 2022
@highker
Copy link
Contributor

highker commented Mar 28, 2022

Please fix the test failure; seem to be related

@nmahadevuni nmahadevuni force-pushed the s3overwritepartition branch 3 times, most recently from 32c890c to d269b17 Compare March 29, 2022 10:23
This implementation writes to target partition directory directly.
After successful write, all files within partition whose name prefix
or suffix don't match current query ID are removed.

Cherry-pick of trinodb/trino@96e77e7

Co-authored-by: Arkadiusz Czajkowski <arek@starburstdata.com>
@nmahadevuni nmahadevuni force-pushed the s3overwritepartition branch from d269b17 to 7860880 Compare March 29, 2022 11:46
@nmahadevuni
Copy link
Member Author

nmahadevuni commented Mar 29, 2022

Please fix the test failure; seem to be related

The test failure most likely is because the container stopped before the actual tests ran. This is happening intermittently. I'm now running them in a separate profile in a new job "hive tests / hive-dockerized-tests" in "hive-tests.yml". "test other modules / test-other-modules " failure is not related.

@highker
Copy link
Contributor

highker commented Mar 29, 2022

test-other-modules also failed; can we check?

@nmahadevuni
Copy link
Member Author

test-other-modules also failed; can we check?

It timed out. It should have no issues running. Can you please restart just that one?

@nmahadevuni
Copy link
Member Author

@highker test-other-modules passed now.

@highker highker merged commit 3fbbb67 into prestodb:master Mar 31, 2022
@nmahadevuni nmahadevuni deleted the s3overwritepartition branch March 31, 2022 04:24
@nmahadevuni
Copy link
Member Author

Thank you @NikhilCollooru and @highker for the review.

@mshang816 mshang816 mentioned this pull request May 17, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants