-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Store] Add netty dependencies in repository-s3 plugin #7216
[Remote Store] Add netty dependencies in repository-s3 plugin #7216
Conversation
Gradle Check (Jenkins) Run Completed with:
|
e9b9857
to
5d16034
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
5d16034
to
6baca42
Compare
Gradle Check (Jenkins) Run Completed with:
|
Are both AWS SDK v1 and v2 required now? I would have expected an upgrade. |
Both are not required but the plan was to first integrate it with remote store, let it bake for some time and then migrate other users like snapshot and deprecate v1. Intention was to reduce the blast radius. Remote store will only use v2 sdk though. |
If we need both in the source code let's rename |
@dblock I agree with you that we should migrate to aws_v2 as soon as possible. Our perf runs also resulted in significant improvement in throughput (~15%) and we were able to reap the benefits of multi-part parallel upload with the help of async IO. |
I lean towards supporting AWS SDK v2 only. The option to support both SDKs seems to add quite a burden for both users and maintainers:
That's a valid point, we do publish the repository-s3 artifacts. What if we branch this plugin to let say repository-s3-v2 (or alike)? We do not install these plugins by default (only bundle them) so anyone could choose. |
How do you think making addition of SDKv2 dependencies mandatory will break existing users?
It won't be just configuration based control as even OpenSearch needs to handle async flow differently. Therefore, there are new abstractions defined for async flow which uses v2 sdk. Meaning, there is a clear separation of v1 flows and v2 flows and both are defined, implemented and can be used independently.
Again, this is already done and PRs for the same are raised and tagged against Meta issue for reference.
If I am not wrong, it is modules which are installed by default and not plugins. So, even repository-s3 is an optional package. We have added new optional abstractions in OpenSearch and respective implementations in repository-s3. By adding new implementations of SDKv2 in repository-s3, I don't think that we are breaking anything for existing users. Please correct me if I am missing something here. We have carried out perf as well for both v1 and v2 SDKs working together in different flows. |
thanks @raghuvanshraj
The artifacts we publish do bring transitively a lot of dependencies. Now, we used to depend on AWS v1 but would depend on AWS v2 as well. We do not know how the plugin dependencies are being used in the wild, I have seen many variations, but here are a few:
This is all speculation, I cannot prove or disprove that, but it is worth to think about.
I don't think we need to use different SDKs for different flows, the v2 should easily cover both sync and async flows [1], could you may be hint why AWS v1 is still needed and cannot be replaced with AWS v2? [1] https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration-whats-different.html |
@reta ,
AWS SDK doc says that we can use both side by side. Example in this doc uses two different services but services are only implemented as modules of the global AWS SDKs and not as separate SDKs. https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration-side-by-side.html
Yes, we don't need to use sync flows at all. The point was that sync flows are old flows which use SDKv1 and are used in snapshot path currently. We can integrate remote store flows with sdk v2 first and then in a separate task migrate snapshot code and get rid of SDKv1 from repository-s3 plugin. But we still need to continue to support sdk v1 as a version in OpenSearch and slowly deprecate it as other implementation of repository plugin or some other plugin might still be using it. |
As I mentioned - I do not have prove of that, otherwise I would point to the projects directly. I could easily craft the examples but that would not be fair.
I think this basically answers it all - we could remove v1 altogether, afaik the fact that S3 repository implementation uses SDK v1 is implementation detail , it does not leak through public API (the only single class AmazonS3Reference that could be made package-private as well).
This is what we have to tackle first I believe, prepare the foundation before building on top. |
6baca42
to
9a8c5e3
Compare
Gradle Check (Jenkins) Run Completed with:
|
9a8c5e3
to
92e314c
Compare
Gradle Check (Jenkins) Run Completed with:
|
92e314c
to
b462d63
Compare
Gradle Check (Jenkins) Run Completed with:
|
b50eab3
to
0761114
Compare
0761114
to
acaa71c
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
acaa71c
to
b6a6fb5
Compare
Gradle Check (Jenkins) Run Completed with:
|
b6a6fb5
to
2d676a5
Compare
Gradle Check (Jenkins) Run Completed with:
|
2d676a5
to
b45be52
Compare
Gradle Check (Jenkins) Run Completed with:
|
b45be52
to
f9be372
Compare
Gradle Check (Jenkins) Run Completed with:
|
f9be372
to
91ecbab
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>
91ecbab
to
ff7995c
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com> (cherry picked from commit 1f4d0df) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…arch-project#7216) (opensearch-project#8072) (cherry picked from commit 1f4d0df) Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…arch-project#7216) Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com> Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
…arch-project#7216) Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Credits: @vikasvb90 and @itiyamas for the design and core implementations of the feature.
Description
Adding netty and AWS v2 SDK dependencies to use for parallel multipart async uploads
Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.