Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

activator's PDB creates with no disruptionsAllowed when there are a few replicas #376

Closed
nak3 opened this issue Dec 3, 2020 · 18 comments · Fixed by #1125
Closed

activator's PDB creates with no disruptionsAllowed when there are a few replicas #376

nak3 opened this issue Dec 3, 2020 · 18 comments · Fixed by #1125
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@nak3
Copy link
Contributor

nak3 commented Dec 3, 2020

Since knative/serving@591510c was introduced, activator's PDB has 80% and operator includes the PDB without any change.
PDB 80% creates with zero disruptionsAllowed even if we have 2 or some replicas, so it will block an eviction.

e.g. Even 2 activator replicas, disruptionsAllowed is zero.

spec:
  minAvailable: 80%
  selector:
    matchLabels:
      app: activator
status:
  currentHealthy: 2
  desiredHealthy: 2
  disruptionsAllowed: 0
  expectedPods: 2
  observedGeneration: 1

The users who installed Knaative by operator cannot modify the PDB so operator should manage the PDB well or document this HA restriction.

@nak3 nak3 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 3, 2020
@nak3
Copy link
Contributor Author

nak3 commented Dec 3, 2020

FYI @vagababov @julz

@github-actions
Copy link

github-actions bot commented Mar 4, 2021

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2021
@Fabian-K
Copy link

Because the PDB cannot be configured using the operator, currently the only way to get 1 allowed disruption is to run 5 replicas. Do you consider making this configurable or changing the PDB itself?

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 16, 2021
@marcomancini1994
Copy link

We are actually facing the same issue when using the Knative Operator, are you guys planning to release this feature in the next release? (e.g 0.23.0)

@matzew
Copy link
Member

matzew commented Apr 28, 2021

The same is true for eventing, not just the serving PDB.

@matzew
Copy link
Member

matzew commented Apr 28, 2021

/cc @houshengbo @lionelvillard

@Morriz
Copy link

Morriz commented Jul 3, 2021

bump

@treyhyde
Copy link

treyhyde commented Aug 6, 2021

I have really struggled with this one. Finally buckled down and added the 5 replicas so I could actually shut things down without manual intervention only to discover that cluster-autoscaler.kubernetes.io/safe-to-evict: "false" is sprinkled everywhere. Is there documentation somewhere suggesting why this stuff isn't safe to evict?

@nak3
Copy link
Contributor Author

nak3 commented Aug 24, 2021

I think this issue should be fixed before Beta/GA. There are some options for the solutions like:

  • option-1. Support fully customizable pdb.
  • option-2. Support only enable or disable option for pdb.
  • option-3. Do not include pdb manifest at all.
  • option-4. Support "automatic" pdb based on the number of replicas.

Probably option-2 (like spec.high-availability) or option-1 would be alright?

@treyhyde
Copy link

option 1 seems like a good path, but frankly, a more reasonable default would go a long way. (minAvailable: 1 or maxDisruptions: 1)

@peteroruba
Copy link

We are also stuck with this issue when it comes to replacing worker nodes.

@bartusz01
Copy link

hi @nak3 any update on this?

@Legion2
Copy link

Legion2 commented Nov 18, 2021

Instead of using minAvailable 80% I use maxUnavailable 20%, which works with 2 replicas

@nak3
Copy link
Contributor Author

nak3 commented Nov 19, 2021

@bartusz01 No, I'm not working on it.

bartusz01 pushed a commit to linode/apl-core that referenced this issue Nov 19, 2021
…r e.g. node reboots (kured)

When Kured is installed it needs to be able to drain nodes, this is only possible if all pdbs have
at least 1 allowed disruption. PDBS have not been changed, but minimum number of replicas have. For
knative this is a temporary fix until another solution is offered where <5 replicas is possible. See
knative/operator#376
@isaac88
Copy link

isaac88 commented Mar 2, 2022

Hello @nak3 @matzew @julz,
Do you have the ETA for that ?
Best regards.

@nak3
Copy link
Contributor Author

nak3 commented Mar 2, 2022

@isaac88 Sorry I am not working on it so I have no idea.

@github-actions
Copy link

github-actions bot commented Jun 1, 2022

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2022
@Morriz
Copy link

Morriz commented Jun 2, 2022

bump

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants