-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented computation of segment replication stats at shard level #17055
base: main
Are you sure you want to change the base?
Conversation
The method implemented here computes the segment replication stats at the shard level, instead of relying on the primary shard to compute stats based on reports from its replicas. Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
❌ Gradle check result for 59e2617: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…ibility Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17055 +/- ##
============================================
+ Coverage 72.34% 72.39% +0.05%
- Complexity 65481 65589 +108
============================================
Files 5300 5305 +5
Lines 304330 304703 +373
Branches 44141 44197 +56
============================================
+ Hits 220158 220599 +441
+ Misses 66093 66052 -41
+ Partials 18079 18052 -27 ☔ View full report in Codecov by Sentry. |
...c/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationStatsIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
@@ -149,7 +150,8 @@ private static ReplicationCheckpoint readCheckpointFromIndexInput( | |||
in.readLong(), | |||
in.readLong(), | |||
in.readString(), | |||
toStoreFileMetadata(uploadedSegmentMetadataMap) | |||
toStoreFileMetadata(uploadedSegmentMetadataMap), | |||
in.readLong() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does bwc work here with this added field? Do we need to bump CURRENT_VERSION
on RemoteSegmentMetadata
? @sachinpkale can you pls assist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If data is being read without the writer serialising it, we will end with an end of stream exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, so we'd need to check the version before attempting to read. I think we may want to add a mixed cluster test here in any case. I don't think theres an existing qa pkg that runs with remote enabled clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a discussion on this with @sachinpkale @linuxpi, thanks for providing your inputs.
There are two scenarios here:
-
Backward compatibility (Reader is in version 3.0 and writer is in 2.17)
-> Here, the reader expects a field that doesn’t exist in the older writer’s data.
-> So, solution would be we can add a condition here by bumping theCURRENT_VERSION
-
Forward compatibility (Reader is in version 2.17 and writer is in 3.0)
-> Here, the reader will not know how to interpret the new data in the stream.
-> Its unclear if we need to support this; @sachinpkale will update here once we know this.
@@ -294,6 +294,7 @@ public synchronized void onNewCheckpoint(final ReplicationCheckpoint receivedChe | |||
return; | |||
} | |||
updateLatestReceivedCheckpoint(receivedCheckpoint, replicaShard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to remove this and the latestReceivedCheckpoint
map from this class as the replicator is the source of truth now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but i would prefer removing this in a separate PR, bcz SegRep Stats uses this. So i will remove this once we start returning SegRep Stats from shard level.
server/src/main/java/org/opensearch/indices/replication/SegmentReplicator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/replication/SegmentReplicator.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
❌ Gradle check result for 23cedac: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
❌ Gradle check result for f80791f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
❌ Gradle check result for 29ffb01: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
❌ Gradle check result for 5d1180f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
The method implemented here computes the segment replication stats at the shard level, instead of relying on the primary shard to compute stats based on reports from its replicas.
Method implemented in this PR serves the segment replication stats for following core APIs:
/_nodes/stats
)/_cluster/stats
)/_stats
or/{index}/_stats
)Related Issues
Resolves #16801
Related to #15306
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.