-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added BufferedInputStream to allow mark and reset ops during IO errors #10690
Added BufferedInputStream to allow mark and reset ops during IO errors #10690
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the thread dumps or profiling samples to understand the thread pool size changes
5d09eb8
to
e8eb3a6
Compare
Compatibility status:Checks if related components are compatible with change 9bc21d7 Incompatible componentsIncompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/ml-commons.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git] |
Gradle Check (Jenkins) Run Completed with:
|
@Bukhtawar Priority stream read pool graph This will naturally impact CPU and run with scaling threadpool was seen consuming more CPU as well but it had lesser refresh lag. |
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #10690 +/- ##
============================================
+ Coverage 71.23% 71.28% +0.05%
+ Complexity 58571 58549 -22
============================================
Files 4853 4853
Lines 275796 275797 +1
Branches 40139 40139
============================================
+ Hits 196458 196605 +147
+ Misses 62925 62715 -210
- Partials 16413 16477 +64
|
Signed-off-by: vikasvb90 <vikasvb@amazon.com>
e8eb3a6
to
9bc21d7
Compare
plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/async/AsyncPartsHandler.java
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
inputStreamContainer.getInputStream(), | ||
// Buffered stream is needed to allow mark and reset ops during IO errors so that only buffered | ||
// data can be retried instead of retrying whole file by the application. | ||
new BufferedInputStream(inputStreamContainer.getInputStream(), (int) (ByteSizeUnit.MB.toBytes(1) + 1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vikasvb90 curious who is closing the BufferedInputStream
? Does not look like AsyncRequestBody
will do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @reta for pointing this!
We are wrapping the input stream with BufferedInputStream. Caller of async S3 api should close the provided input stream which means that we won't have any IO resource left open and all subsequent IO ops on the inner stream should result in stream closed errors. And instance of BufferedInputStream should get gc collected.
In remote store cases, we are closing the streams here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vikasvb90 , I understand regarding the wrapped input stream, but not BufferedInputStream
- this is clearly a leak of the resource since BufferedInputStream
allocated buffers that should be cleaned up as well. Please add hook to close the BufferedInputStream
instance (or make inputStreamContainer
to take care of it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If reference to BufferedInputStream
is not present then internal buffer should also be eligible for gc. I can add hook for additional cleanup but doesn't look like a leak. I also don't see any rise in JVM in the benchmark if this was the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resource management discipline: once created, the InputStream should be closed.
opensearch-project#10690) Signed-off-by: vikasvb90 <vikasvb@amazon.com> Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
opensearch-project#10690) Signed-off-by: vikasvb90 <vikasvb@amazon.com> (cherry picked from commit 75bd9f2) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…during IO errors #10690 (#10741) * Added BufferedInputStream to allow mark and reset ops during IO errors (#10690) Signed-off-by: vikasvb90 <vikasvb@amazon.com> (cherry picked from commit 75bd9f2) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Added close on buffered stream in s3 async upload for additional cleanup (#10710) Signed-off-by: vikasvb90 <vikasvb@amazon.com> --------- Signed-off-by: vikasvb90 <vikasvb@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Vikas Bansal <43470111+vikasvb90@users.noreply.github.com>
opensearch-project#10690) Signed-off-by: vikasvb90 <vikasvb@amazon.com>
opensearch-project#10690) Signed-off-by: vikasvb90 <vikasvb@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
In S3 SDK v2, if stream supplied by the caller doesn't support mark and reset then in case of IO errors, entire upload fails instead of just retrying the buffer. This change wraps the provided stream with BufferedInputStream to always allow mark and reset for efficient uploads.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.