-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication operation that try to perform the primary phase on a replica should be retried #17358
Conversation
@@ -741,7 +741,11 @@ public String toString() { | |||
QUERY_SHARD_EXCEPTION(org.elasticsearch.index.query.QueryShardException.class, | |||
org.elasticsearch.index.query.QueryShardException::new, 141), | |||
NO_LONGER_PRIMARY_SHARD_EXCEPTION(ShardStateAction.NoLongerPrimaryShardException.class, | |||
ShardStateAction.NoLongerPrimaryShardException::new, 142); | |||
ShardStateAction.NoLongerPrimaryShardException::new, 142), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last thing we need is another exception. We should get rid of the instanceof checks in order to do retries and use ElasticsearchException#addHeader
I think we can just use IllegalIndexShardStateException
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s an interesting idea and I agree more tailor made exceptions are a shame. I’ll look at that approach…
On 28 Mar 2016, at 20:48, Simon Willnauer notifications@github.com wrote:
In core/src/main/java/org/elasticsearch/ElasticsearchException.java:
@@ -741,7 +741,11 @@ public String toString() {
QUERY_SHARD_EXCEPTION(org.elasticsearch.index.query.QueryShardException.class,
org.elasticsearch.index.query.QueryShardException::new, 141),
NO_LONGER_PRIMARY_SHARD_EXCEPTION(ShardStateAction.NoLongerPrimaryShardException.class,
ShardStateAction.NoLongerPrimaryShardException::new, 142);
ShardStateAction.NoLongerPrimaryShardException::new, 142),
I think the last thing we need is another exception. We should get rid of the instanceof checks in order to do retries and use ElasticsearchException#addHeader I think we can just use IllegalIndexShardStateException instead?
—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
0cbe6b7
to
0ecf7a4
Compare
@s1monw I pushed another commit, removing the custom exceptions and relying a primary check in TRA, before acquiring an operation lock from the relevant IndexShard instance. This is OK since a primary shard will never be demoted |
@@ -799,6 +800,9 @@ void finishBecauseUnavailable(ShardId shardId, String message) { | |||
protected IndexShardReference getIndexShardReferenceOnPrimary(ShardId shardId) { | |||
IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex()); | |||
IndexShard indexShard = indexService.getShard(shardId.id()); | |||
if (indexShard.routingEntry().primary() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put a small comment here when and why this could happen?
left a minor regarding a comment - LGTM otherwise thanks! |
setState(clusterService, stateWithActivePrimary(index, true, randomInt(3))); | ||
logger.debug("--> using initial state:\n{}", clusterService.state().prettyPrint()); | ||
Request request = new Request(shardId); | ||
boolean timeout = true; //randomBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has happened before (:wink:, rescue the randomBoolean
invocation from the comment grave).
Just a minor issue (the commented out randomization in the test), and two nits. Otherwise, LGTM. |
efb7836
to
d03562f
Compare
…ica should be retried In extreme cases a local primary shard can be replaced with a replica while a replication request is in flight and the primary action is applied to the shard (via `acquirePrimaryOperationLock()). elastic#17044 changed the exception used in that method to something that isn't recognized as `TransportActions.isShardNotAvailableException`, causing the operation to fail immediately instead of retrying. This commit fixes this by check the primary flag before acquiring the lock. This is safe to do as an IndexShard will never be demoted once a primary. Closes elastic#17358
d03562f
to
48b4f08
Compare
In extreme cases a local primary shard can be replaced with a replica while a replication request is in flight and the primary action is applied to the shard (via
acquirePrimaryOperationLock()
). #17044 changed the exception used in that method to something that isn't recognized asTransportActions.isShardNotAvailableException
, causing the operation to fail immediately instead of retrying. This commit fixes this by check the primary flag beforeacquiring the lock. This is safe to do as an IndexShard will never be demoted once a primary.
Marking as non issue as this was never released.
Example failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=centos/138