Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP Support over the S3 XML API #2176

Merged
merged 2 commits into from
Feb 20, 2025
Merged

Conversation

poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented Feb 12, 2025

This adds support for GCP over the S3-compatible XML API.

GCP support did not work out of the box because Google does not support any of the S3 "batch" APIs. We used batch delete for all our deletion operations, which broke. Instead I use the unary delete operation here, using the AWS SDK's async API to reduce the performance impact of sending single object deletes.

Users connect to GCP like:

ac = Arctic("gcpxml://storage.googleapis.com:aseaton_bucket?access=...&secret=...")
ac = Arctic("gcpxmls://storage.googleapis.com:aseaton_bucket?access=...&secret=...")
# Kept aws_auth naming as this will use ~/.aws/credentials
ac = Arctic("gcpxml://storage.googleapis.com:aseaton_bucket?aws_auth=true&aws_profile=blah")
ac = Arctic("gcpxml://storage.googleapis.com:aseaton_bucket?aws_auth=true&prefix=my_prefix")

We have a URL of gcpxml so that if we do a full GCP implementation later, users can get it with gcp://. I've added a GCP storage proto object so that users on the gcpxml API will later be able to move to a gcp:// API without updating any library configs. The proto is tiny, just storing the library's prefix, which is the only information we need to serialize.

This is implemented as a small sublass of our existing S3 implementation, with its deletion functionality overridden. The Python library adapter creates the NativeVersionStore with a GCPXMLSettings in memory object (in the NativeVariantStorage) which makes us create a GCPXMLStorage C++ adapter rather than the normal S3Storage.

The interesting part of this PR is the threading model in do_remove_no_batching_impl. This is called from an IO executor as part of the RemoveTask so it is not safe (due to the risk of deadlocks) to submit any work to the IO executor from it. Instead:

  • The storage client s3_client_impl.cpp submits delete operations with the S3 API. The S3 SDK has its own event loop to handle these. When they complete, they resolve a folly::Promise, and the client returns a folly::Future out of that promise.
  • Our storage adapter (detail-inl.hpp) collects these futures, on an inline executor. This is fine because there is no work for this executor to do, other than waiting to be notified that promises have been resolved by the AWS SDK.

For testing, I created a moto based test fixture that returns errors when batch delete operations are invoked. Also tested manually against a real GCP backend.

There's some duplication between the gcpxml and the s3 Python adapters, but since hopefully the gcpxml adapter will be temporary until a full GCP implementation, I didn't see the value in refactoring it away.

Monday: 8450684276

@@ -368,9 +384,11 @@ def mem_storage() -> Generator[InMemoryStorageFixture, None, None]:
@pytest.fixture(
scope="function",
params=[
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not merge the commented out fixtures

@poodlewars poodlewars force-pushed the aseaton/8450684276/gcp-support branch from b938fc1 to b798d13 Compare February 13, 2025 18:05
void update(const s3::S3Settings& config) {
config_ = config;
}

std::string to_string() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just slightly helpful for debugging in the Python layer.

@poodlewars poodlewars changed the title aseaton/8450684276/gcp-support GCP Support over the S3 XML API Feb 17, 2025
@poodlewars poodlewars marked this pull request as ready for review February 17, 2025 17:35
Copy link
Collaborator

@G-D-Petrov G-D-Petrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I just have a couple of comments that don't need to be addressed in this PR:

  1. As far as I can see, it is still possible to connect to GCP via the S3Storage directly, rather than GCPXMLStorage, this would be a user error ofc but an annoying one. I think that we should either:
  • have some logic that forbids this
  • add a fall back in the S3Storage when the batch delete fails
  1. We should add some tests with a real GCP, not just a simulator

@@ -89,6 +89,12 @@ class ConfigCache {
storage_conf.config().UnpackTo(&s3_storage);
storages.emplace_back(create_storage(path, mode, s3::S3Settings(settings).update(s3_storage)));
},
[&storage_conf, &storages, &path, mode] (const s3::GCPXMLSettings& settings) {
util::check(storage_conf.config().Is<arcticdb::proto::gcp_storage::Config>(), "Only support S3 native settings");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"S3" -> "GCP"

def test_gcp_no_batch_delete(gcp_storage_factory: MotoGcpS3StorageFixtureFactory):
# Given
s3 = gcp_storage_factory
bucket = s3.create_fixture()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"s3" to "gcp" maybe to lessen the confusion?

return py::make_tuple(settings.aws_auth());
},
[](py::tuple t) {
util::check(t.size() == 1, "Invalid GCPXMLSettings pickle objects");
Copy link
Collaborator

@phoebusm phoebusm Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have read the code correctly, gcpxml settings will be stored in this native class, instead of in protobuf, like we used to do.
If so, I think the rest of settings in the class are needed to be pickled as well.
Nice to have it test on test_fork.py

package arcticc.pb2.gcp_storage_pb2;

message Config {
string prefix = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why prefix is the special kid, gets to stay in the proto, instead of living in native config like its sibilings 🤔

Copy link
Collaborator Author

@poodlewars poodlewars Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the path of a library is determined when the library is created, adding on the creation timestamp. We have to store it in _arctic_cfg or we'd never be able to find the library again.

b'</Error>'
)
start_response(
"503 Slow Down", [("Content-Type", "text/xml"), ("Content-Length", str(len(response_body)))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the http code is not 503 Slow Down for the response of not-implemented batch delete right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fixed in the Windows fixes commit

@poodlewars
Copy link
Collaborator Author

LGTM I just have a couple of comments that don't need to be addressed in this PR:

1. As far as I can see, it is still possible to connect to GCP via the `S3Storage` directly, rather than `GCPXMLStorage`, this would be a user error ofc but an annoying one. I think that we should either:


* have some logic that forbids this

* add a fall back in the `S3Storage` when the batch delete fails


2. We should add some tests with a real GCP, not just a simulator

Thanks!

Agreed on 2, we'll do that as a follow up. I have done a lot of manual testing with real GCP though.

On 1, to start with this will be a bit of a "hidden" feature and we can talk people through how to use it. I'd rather not implement any fancy "is it GCP" detection or fallbacks until we know that people get bitten by this.

Copy link
Collaborator

@phoebusm phoebusm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the pickle handling
Nice async on delete

@poodlewars poodlewars force-pushed the aseaton/8450684276/gcp-support branch 5 times, most recently from a53e89c to e3787c6 Compare February 20, 2025 09:32
@poodlewars poodlewars force-pushed the aseaton/8450684276/gcp-support branch from e3787c6 to 5463f96 Compare February 20, 2025 10:28
@poodlewars poodlewars merged commit 5958781 into master Feb 20, 2025
152 checks passed
@poodlewars poodlewars deleted the aseaton/8450684276/gcp-support branch February 20, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants