Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BufferedInputStream to allow mark and reset ops during IO errors #10690

Conversation

vikasvb90
Copy link
Contributor

Description

In S3 SDK v2, if stream supplied by the caller doesn't support mark and reset then in case of IO errors, entire upload fails instead of just retrying the buffer. This change wraps the provided stream with BufferedInputStream to always allow mark and reset for efficient uploads.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share the thread dumps or profiling samples to understand the thread pool size changes

@vikasvb90 vikasvb90 force-pushed the enabling_mark_and_reset_on_s3_stream branch from 5d09eb8 to e8eb3a6 Compare October 18, 2023 10:42
@github-actions
Copy link
Contributor

github-actions bot commented Oct 18, 2023

Compatibility status:

Checks if related components are compatible with change 9bc21d7

Incompatible components

Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/ml-commons.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90
Copy link
Contributor Author

@Bukhtawar
We don't need future completion pool resizing. I have reverted that. For stream reader pools, it is a decision between performance and refresh lag. For so workload, active thread count with 1 host generating the load is as shown in the graph

Stream read pool graph
Screenshot 2023-10-18 at 4 16 55 PM

Priority stream read pool graph
Screenshot 2023-10-18 at 4 18 18 PM

This will naturally impact CPU and run with scaling threadpool was seen consuming more CPU as well but it had lesser refresh lag.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testRequestStats
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDeleteOperations

@codecov
Copy link

codecov bot commented Oct 18, 2023

Codecov Report

Merging #10690 (9bc21d7) into main (9cc0e7d) will increase coverage by 0.05%.
Report is 2 commits behind head on main.
The diff coverage is 33.33%.

@@             Coverage Diff              @@
##               main   #10690      +/-   ##
============================================
+ Coverage     71.23%   71.28%   +0.05%     
+ Complexity    58571    58549      -22     
============================================
  Files          4853     4853              
  Lines        275796   275797       +1     
  Branches      40139    40139              
============================================
+ Hits         196458   196605     +147     
+ Misses        62925    62715     -210     
- Partials      16413    16477      +64     
Files Coverage Δ
...earch/repositories/s3/async/AsyncPartsHandler.java 90.38% <100.00%> (ø)
...ch/repositories/s3/async/AsyncTransferManager.java 83.44% <100.00%> (ø)
...opensearch/repositories/s3/S3RepositoryPlugin.java 0.00% <0.00%> (ø)

... and 479 files with indirect coverage changes

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
@vikasvb90 vikasvb90 force-pushed the enabling_mark_and_reset_on_s3_stream branch from e8eb3a6 to 9bc21d7 Compare October 18, 2023 10:55
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPrimaryStopped_ReplicaPromoted

@gbbafna gbbafna merged commit 75bd9f2 into opensearch-project:main Oct 18, 2023
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Oct 18, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 18, 2023
#10690)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
(cherry picked from commit 75bd9f2)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
inputStreamContainer.getInputStream(),
// Buffered stream is needed to allow mark and reset ops during IO errors so that only buffered
// data can be retried instead of retrying whole file by the application.
new BufferedInputStream(inputStreamContainer.getInputStream(), (int) (ByteSizeUnit.MB.toBytes(1) + 1)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vikasvb90 curious who is closing the BufferedInputStream? Does not look like AsyncRequestBody will do so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @reta for pointing this!
We are wrapping the input stream with BufferedInputStream. Caller of async S3 api should close the provided input stream which means that we won't have any IO resource left open and all subsequent IO ops on the inner stream should result in stream closed errors. And instance of BufferedInputStream should get gc collected.

In remote store cases, we are closing the streams here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vikasvb90 , I understand regarding the wrapped input stream, but not BufferedInputStream - this is clearly a leak of the resource since BufferedInputStream allocated buffers that should be cleaned up as well. Please add hook to close the BufferedInputStream instance (or make inputStreamContainer to take care of it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If reference to BufferedInputStream is not present then internal buffer should also be eligible for gc. I can add hook for additional cleanup but doesn't look like a leak. I also don't see any rise in JVM in the benchmark if this was the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resource management discipline: once created, the InputStream should be closed.

deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Oct 19, 2023
opensearch-project#10690)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
gbbafna pushed a commit to gbbafna/OpenSearch that referenced this pull request Oct 19, 2023
opensearch-project#10690)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
(cherry picked from commit 75bd9f2)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna added a commit that referenced this pull request Oct 20, 2023
…during IO errors #10690  (#10741)

* Added BufferedInputStream to allow mark and reset ops during IO errors (#10690)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
(cherry picked from commit 75bd9f2)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Added close on buffered stream in s3 async upload for additional cleanup (#10710)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>

---------

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Vikas Bansal <43470111+vikasvb90@users.noreply.github.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Oct 23, 2023
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
opensearch-project#10690)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants