-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling OpenSearchRejectExecuteException Exception #1004
Handling OpenSearchRejectExecuteException Exception #1004
Conversation
Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
Codecov Report
@@ Coverage Diff @@
## main #1004 +/- ##
============================================
- Coverage 72.39% 71.17% -1.23%
+ Complexity 1007 1006 -1
============================================
Files 141 141
Lines 4695 4770 +75
Branches 527 540 +13
============================================
- Hits 3399 3395 -4
- Misses 963 1034 +71
- Partials 333 341 +8
|
@@ -214,7 +214,7 @@ class ShardReplicationTask(id: Long, type: String, action: String, description: | |||
// Since this setting is not dynamic, setting update would only reflect after pause-resume or on a new replication job. | |||
val rateLimiter = Semaphore(replicationSettings.readersPerShard) | |||
val sequencer = TranslogSequencer(scope, replicationMetadata, followerShardId, leaderAlias, leaderShardId.indexName, | |||
TaskId(clusterService.nodeName, id), client, indexShard.localCheckpoint, followerClusterStats) | |||
TaskId(clusterService.nodeName, id), client, indexShard.localCheckpoint, followerClusterStats, replicationSettings.readersPerShard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have one more setting for writerpershard with default value as 2
to initialize this value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@@ -73,26 +88,52 @@ class TranslogSequencer(scope: CoroutineScope, private val replicationMetadata: | |||
var relativeStartNanos = System.nanoTime() | |||
val retryOnExceptions = ArrayList<Class<*>>() | |||
retryOnExceptions.add(MappingNotAvailableException::class.java) | |||
var tooManyRequestsAcknowledgement = true | |||
try { | |||
while (tooManyRequestsAcknowledgement) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the current changes, let's retry after the backoff and handle any errors within the translog sequencer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
retryException = e; | ||
} | ||
else { | ||
throw ReplicationException("OpensearchRejectExecutionException being thrown as ReplicationException", RestStatus.TOO_MANY_REQUESTS,e as Throwable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we construct the message from the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will Modify it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep OpensearchRejectExecutionException being thrown as ReplicationException
is not very helpful. Let's use the original exception.
…ceptions instead of just 429 exception Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
REPLICATION_AUTOFOLLOW_REMOTE_INDICES_RETRY_POLL_INTERVAL, REPLICATION_METADATA_SYNC_INTERVAL, | ||
REPLICATION_RETENTION_LEASE_MAX_FAILURE_DURATION, REPLICATION_INDEX_TRANSLOG_PRUNING_ENABLED_SETTING, | ||
REPLICATION_INDEX_TRANSLOG_RETENTION_SIZE, REPLICATION_FOLLOWER_BLOCK_START, REPLICATION_AUTOFOLLOW_CONCURRENT_REPLICATION_JOBS_TRIGGER_SIZE) | ||
REPLICATION_FOLLOWER_CONCURRENT_WRITERS_PER_SHARD, REPLICATION_FOLLOWER_RECOVERY_CHUNK_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add it to the end of the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
retryException = e; | ||
} | ||
else { | ||
throw ReplicationException(e.message.toString(), RestStatus.TOO_MANY_REQUESTS,e as Throwable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use Throwable
directly for constructing exception instead on toString()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replication Exception Constructor expects a string as well along with throwable.
followerClusterStats.stats[followerShardId]!!.opsWritten.addAndGet( | ||
replayRequest.changes.size.toLong() | ||
) | ||
} catch (e: OpenSearchException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract the following code from extensions.kt below and reuse here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed java.class exceptions will add them as mentioned in extenions.kt
|
||
private val followerClusterStats: FollowerClusterStats, writersPerShard : Int) { | ||
//Start backOff for exceptions with a second | ||
private val initialBackoffMillis = 1000L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be redundant based on the logic from Extensions.kt, Could we re-use the method here?
…ode from translogSequencer Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
// Exceptions thrown here will mark the channel as failed and the next attempt to send to the channel will | ||
// raise the same exception. See [SendChannel.close] method for details. | ||
val rateLimiter = Semaphore(writersPerShard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more details in PR description on why we're introducing the rate limiting here?
Also, we'll need to update the documentation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
|| e.status() == RestStatus.TOO_MANY_REQUESTS)) { | ||
tryReplay = true | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Indentation in wrong.
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.1 1.1
# Navigate to the new working tree
cd .worktrees/backport-1.1
# Create a new branch
git switch --create backport/backport-1004-to-1.1
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 448e7a7501e5ff8740dc2d7635c08ae62d19147e
# Push it to GitHub
git push --set-upstream origin backport/backport-1004-to-1.1
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.1 Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.8 2.8
# Navigate to the new working tree
cd .worktrees/backport-2.8
# Create a new branch
git switch --create backport/backport-1004-to-2.8
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 448e7a7501e5ff8740dc2d7635c08ae62d19147e
# Push it to GitHub
git push --set-upstream origin backport/backport-1004-to-2.8
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.8 Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.x 1.x
# Navigate to the new working tree
cd .worktrees/backport-1.x
# Create a new branch
git switch --create backport/backport-1004-to-1.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 448e7a7501e5ff8740dc2d7635c08ae62d19147e
# Push it to GitHub
git push --set-upstream origin backport/backport-1004-to-1.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.x Then, create a pull request where the |
* Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7)
…ct#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
Handling OpenSearchRejectExecuteException Exception (opensearch-project#1004)
…ct#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
…#1004) (#1055) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * ratelimiter acquiring after getting unAppliedChanges in TranslogSequncer Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com> Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com>
…1004) (#1025) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * ratelimiter acquiring after getting unAppliedChanges in TranslogSequncer Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com> Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com>
…1004) (#1026) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * ratelimiter acquiring after getting unAppliedChanges in TranslogSequncer Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com> Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com>
…1004) (#1022) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * ratelimiter acquiring after getting unAppliedChanges in TranslogSequencer Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
…1004) (#1021) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * ratelimiter acquiring after getting unAppliedChanges in TranslogSequncer Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com> Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com>
…1004) (#1020) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * ratelimiter acquiring after getting unAppliedChanges in TranslogSequncer Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com> Signed-off-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com>
…1004) (#1014) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * fix rateLimiter.acquire() Signed-off-by: monusingh-1 <msnghgw@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com> Signed-off-by: monusingh-1 <msnghgw@amazon.com> Co-authored-by: monusingh-1 <msnghgw@amazon.com>
…1004) (#1027) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
…1004) (#1027) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit d2b49ff)
…1004) (#1027) (#1065) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit d2b49ff) Co-authored-by: sricharanvuppu <113983630+sricharanvuppu@users.noreply.github.com>
…1004) (#1024) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
…1004) (#1023) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
…1004) (#1018) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
…1004) (#1017) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
…1004) (#1015) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
) (#1016) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
…1004) (#1019) * Handling OpenSearchRejectExecuteException Exception (#1004) * Handling OpenSearchRejectExecuteException Exception * introduced writersPerShard setting. Signed-off-by: sricharanvuppu <srivuppu@amazon.com> (cherry picked from commit 448e7a7) Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * correcting OpenSearchRejectedExecutionException import Signed-off-by: sricharanvuppu <srivuppu@amazon.com> * adding missing checkpoint and correcting follower stats test case Signed-off-by: sricharanvuppu <srivuppu@amazon.com> --------- Signed-off-by: sricharanvuppu <srivuppu@amazon.com>
Description
This PR handles OpenSearchRejectExecuteException in suspendExecuteWithRetries
This PR also introduces new
plugins.replication.follower.concurrent_writers_per_shard
setting, this new setting gives user more control over concurrent writes. It default is 2 same asconcurrent_readers_per_shard
and can be changed to minimum value : 1Issues Resolved
#627
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.