Add functionality to forcefully kill an instance #2898
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issues
N/A - New feature to forcefully kill an instance
Description
This PR adds a new feature, a HelixAdmin and Helix-rest API command to forcefully kill an instance. This is achieved by marking the instance's operation as UNKNOWN and then deleting the LIVEINSTANCE znode. This feature is intended for use in a scenario where the participant is in an unrecoverable state but is keeping an active connection with ZK. Marking the node as UNKNOWN will remove it from calculations and subsequently deleting the LIVEINSTANCE znode will cause the controller to consider it as OFFLINE. This skips the requirement that the node must process the downward state transition for topstate handoff to occur.
My current findings indicate that the LIVEINSTANCE znode will only be recreated on ZK session establishment, which occurs on initial connection and after session expiration.
The following code changes were made:
helix-core/src/main/java/org/apache/helix/HelixAdmin.java
: AddedforceKillInstance
method to the HelixAdmin interface.helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
: Implemented theforceKillInstance
method in the ZKHelixAdmin class.helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/PerInstanceAccessor.java
: Added forceKillInstance command to the to the REST API updateInstance endpoint. Called via:Also includes miscellaneous changes:
helix-core/src/test/java/org/apache/helix/integration/rebalancer/TestInstanceOperation.java
Corrected the logger class reference.helix-rest/src/test/java/org/apache/helix/rest/server/TestPartitionAssignmentAPI.java
: Corrected the logger class reference.helix-rest/src/test/java/org/apache/helix/rest/server/AbstractTestClass.java
: Refactored resource creation logic. Added addParticipant and dropParticipant methods. Also added another test cluster to isolatetestPerInstanceAccessor
andtestInstancesAccessor
helix-rest/src/test/java/org/apache/helix/rest/server/TestInstancesAccessor.java
: Now using isolated test clusterTests
helix-core/src/test/java/org/apache/helix/integration/TestForceKillInstance.java
for HelixAdmin APItestForceKillInstance
inhelix-rest/src/test/java/org/apache/helix/rest/server/TestPerInstanceAccessor.java
for Helix-Rest APIChanges that Break Backward Compatibility (Optional)
N/A
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)