-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify AllocationIDs in replication actions #20320
Conversation
|
||
RequestWithAllocationID(Supplier<R> requestSupplier) { | ||
request = requestSupplier.get(); | ||
allocationId = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the invariant that this can be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that is for deserialization :( bummer can you document it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah :(. maybe we should add a variant of registerRequestHandler which takes a function of a stream and returns a request.
left some suggestions LGTM in general |
ActionListener<Releasable> callback = (ActionListener<Releasable>) invocation.getArguments()[1]; | ||
final long primaryTerm = indexShard.getPrimaryTerm(); | ||
if (term < primaryTerm) { | ||
throw new IllegalArgumentException(LoggerMessageFormat.format("{} operation term [{}] is too old (current [{}])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not add another use of LoggerMessageFormat
; I'd like to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied from another place - do you have a decent suggestion for an alternative (the obvious one is chaining strings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String.format(Locale.ROOT, "%s operation term [%d] is too old (current [%d])", shardId, term, primaryTerm)
} | ||
} | ||
|
||
/** test that a replica request is reject if it arrives at a shard with a wrong allocation id */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: reject
-> rejected
I left a few comments, but it looks good. |
allocationId = null; | ||
} | ||
|
||
RequestWithAllocationID(R request, String allocationId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this constructor parameter should be targetAllocationID
too to be consistent with my other suggestions.
thx @s1monw , @jasontedor . I pushed a commit addressing all comments |
I don't get why we call these properties |
Because @jasontedor asked. |
can we keep it simple? |
@bleskes I don't wanna be in the way for such an improvement just because of some internal debatable naming. lets get it in! |
Thx @s1monw and @jasontedor for the review. |
Replicated operation consist of a routing action (the original), which is in charge of sending the operation to the primary shard, a primary action which executes the operation on the resolved primary and replica actions which performs the operation on a specific replica. This commit adds the targeted shard's allocation id to the primary and replica actions and makes sure that those match the shard the actions end up executing on. This helps preventing extremely rare failure mode where a shard moves off a node and back to it, all between an action is sent and the time it's processed. For example: 1) Primary action is sent to a relocating primary on node A. 2) The primary finishes relocation to node B and start relocating back. 3) The relocation back gets to the phase and opens up the target engine, on the original node, node A. 4) The primary action is executed on the target engine before the relocation finishes, at which the shard copy on node B is still the official primary - i.e., it is executed on the wrong primary.
Replicated operation consist of a routing action (the original), which is in charge of sending the operation to the primary shard, a primary action which executes the operation on the resolved primary and replica actions which performs the operation on a specific replica. This commit adds the targeted shard's allocation id to the primary and replica actions and makes sure that those match the shard the actions end up executing on.
This helps preventing extremely rare failure mode where a shard moves off a node and back to it, all between an action is sent and the time it's processed.
For example: