Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented computation of segment replication stats at shard level #17055

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

vinaykpud
Copy link
Contributor

@vinaykpud vinaykpud commented Jan 19, 2025

Description

The method implemented here computes the segment replication stats at the shard level, instead of relying on the primary shard to compute stats based on reports from its replicas.

Method implemented in this PR serves the segment replication stats for following core APIs:

  1. Nodes Stats API (/_nodes/stats)
  2. Cluster Stats API (/_cluster/stats)
  3. Indices Stats API (/_stats or /{index}/_stats)

Related Issues

Resolves #16801
Related to #15306

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

The method implemented here computes the segment replication stats at the shard level,
instead of relying on the primary shard to compute stats based on reports from its replicas.

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@vinaykpud vinaykpud reopened this Jan 22, 2025
Copy link
Contributor

❌ Gradle check result for 59e2617: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…ibility

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Copy link
Contributor

✅ Gradle check result for 90b96a8: SUCCESS

Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 77.52809% with 20 lines in your changes missing coverage. Please review.

Project coverage is 72.39%. Comparing base (e6fc600) to head (5d1180f).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...nsearch/indices/replication/SegmentReplicator.java 81.96% 6 Missing and 5 partials ⚠️
.../replication/checkpoint/ReplicationCheckpoint.java 66.66% 1 Missing and 2 partials ⚠️
...rc/main/java/org/opensearch/index/IndexModule.java 0.00% 2 Missing ⚠️
...c/main/java/org/opensearch/index/IndexService.java 33.33% 2 Missing ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17055      +/-   ##
============================================
+ Coverage     72.34%   72.39%   +0.05%     
- Complexity    65481    65589     +108     
============================================
  Files          5300     5305       +5     
  Lines        304330   304703     +373     
  Branches      44141    44197      +56     
============================================
+ Hits         220158   220599     +441     
+ Misses        66093    66052      -41     
+ Partials      18079    18052      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

server/src/main/java/org/opensearch/index/IndexModule.java Outdated Show resolved Hide resolved
@@ -149,7 +150,8 @@ private static ReplicationCheckpoint readCheckpointFromIndexInput(
in.readLong(),
in.readLong(),
in.readString(),
toStoreFileMetadata(uploadedSegmentMetadataMap)
toStoreFileMetadata(uploadedSegmentMetadataMap),
in.readLong()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does bwc work here with this added field? Do we need to bump CURRENT_VERSION on RemoteSegmentMetadata ? @sachinpkale can you pls assist?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If data is being read without the writer serialising it, we will end with an end of stream exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, so we'd need to check the version before attempting to read. I think we may want to add a mixed cluster test here in any case. I don't think theres an existing qa pkg that runs with remote enabled clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a discussion on this with @sachinpkale @linuxpi, thanks for providing your inputs.

There are two scenarios here:

  1. Backward compatibility (Reader is in version 3.0 and writer is in 2.17)
    -> Here, the reader expects a field that doesn’t exist in the older writer’s data.
    -> So, solution would be we can add a condition here by bumping the CURRENT_VERSION

  2. Forward compatibility (Reader is in version 2.17 and writer is in 3.0)
    -> Here, the reader will not know how to interpret the new data in the stream.
    -> Its unclear if we need to support this; @sachinpkale will update here once we know this.

@@ -294,6 +294,7 @@ public synchronized void onNewCheckpoint(final ReplicationCheckpoint receivedChe
return;
}
updateLatestReceivedCheckpoint(receivedCheckpoint, replicaShard);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove this and the latestReceivedCheckpoint map from this class as the replicator is the source of truth now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but i would prefer removing this in a separate PR, bcz SegRep Stats uses this. So i will remove this once we start returning SegRep Stats from shard level.

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Copy link
Contributor

❌ Gradle check result for 23cedac: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Copy link
Contributor

❌ Gradle check result for f80791f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Copy link
Contributor

❌ Gradle check result for 29ffb01: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@vinaykpud vinaykpud closed this Jan 28, 2025
@vinaykpud vinaykpud reopened this Jan 28, 2025
Copy link
Contributor

❌ Gradle check result for 5d1180f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Jan 29, 2025
@vinaykpud vinaykpud reopened this Jan 29, 2025
Copy link
Contributor

✅ Gradle check result for 5d1180f: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Redefine the computation of segment replication metrics in Node Stats
3 participants