-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a spec regarding the rules for eviction & replacement of pods #133
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least some discussion has to happen and a few typos corrected.
|
||
## Replacement | ||
|
||
Replacement is the process of replacing a pod an another pod that takes over the responsibilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pod an another" -> "pod by another"
|
||
### Image ID Pods | ||
|
||
The Image ID pods are starter to fetch the ArangoDB version of a specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starter -> started
- Image ID pods can always be restarted on a different node. | ||
There is no need to replace an image ID pod. | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set very low (5sec) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set very low (5sec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add (if true): There is no danger at all if two image ID pods happen to run at the same time.
- Coordinator pods can always be evicted from any node | ||
- Coordinator pods can always be replaced with another coordinator pod with a different ID on a different node | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set low (15sec) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set low (15sec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add? "There is no danger at all if two coordinator pods with different ID run concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (a bit different)
### DBServer Pods | ||
|
||
DBServer pods run an ArangoDB dbserver as part of an ArangoDB cluster. | ||
It has persistent state potentially tight to the node it runs on and it has a unique ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"tight" -> "tied"
### Single Server Pods | ||
|
||
Single server pods run an ArangoDB server as part of an ArangoDB single server deployment. | ||
It has persistent state potentially tight to the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"tight" -> "tied"
### Single Pods in Active Failover Deployment | ||
|
||
Single pods run an ArangoDB single server as part of an ArangoDB active failover deployment. | ||
It has persistent state potentially tight to the node it runs on and it has a unique ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"tight" -> "tied"
- It is a follower of an active-failover deployment (Q: can we trigger this failover to another server?) | ||
- Single pods can always be replaced with another single pod with a different ID on a different node. | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set high to "wait it out a while" (5min) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set high to "wait it out a while" (5min) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check this, do not know by heart.
- SyncMaster pods can always be evicted from any node | ||
- SyncMaster pods can always be replaced with another syncmaster pod on a different node | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set low (15sec) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set low (15sec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any requirement about the same network endpoint or an internal k8s service being set up in case of a replacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
- SyncWorker pods can always be evicted from any node | ||
- SyncWorker pods can always be replaced with another syncworker pod on a different node | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set a bit higher to try to avoid resynchronization (1min) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set a bit higher to try to avoid resynchronization (1min) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about network endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
No description provided.