Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AKS node_taints in default_node_pool again #9183

Closed
torumakabe opened this issue Nov 6, 2020 · 17 comments · Fixed by #10307
Closed

Support for AKS node_taints in default_node_pool again #9183

torumakabe opened this issue Nov 6, 2020 · 17 comments · Fixed by #10307

Comments

@torumakabe
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

I understand the background of #8982 , then node_taints is no longer possible to configure from v2.35. But, AKS API allow exception for CriticalAddonsOnly taint on system nodepools (or all) from the latest update.

Azure/AKS#1833

This option is very useful and critical, so could you please consider support it again?

New or Affected Resource(s)

  • azurerm_kubernetes_cluster

Potential Terraform Configuration

  default_node_pool {
    node_taints = ["CriticalAddonsOnly=true:PreferNoSchedule"]
  }

References

@favoretti
Copy link
Contributor

@torumakabe Since I "broke" this. When I look at the linked issue - I'm not sure I follow whether 2020-09-01 API version will allow for that or they released a new API version?

@torumakabe
Copy link
Author

@favoretti According to the latest AKS release note, they allowed it without a new API version. https://github.com/Azure/AKS/blob/master/CHANGELOG.md#release-2020-10-26

The release is rolling out now, and it is available in some regions such as Japan East. So, node_taints in default_node_pool has been successfully applied with azurerm v2.34.

@dhirschfeld
Copy link
Contributor

I wasn't sure if this was supposed to work with the latest 2.39.0 but it appears not:

Error: expanding `default_node_pool`: The AKS API has removed support for tainting all nodes in the default node pool and it is no longer possible to configure this. To taint a node pool, create a separate one

...at least for australiaeast

@favoretti
Copy link
Contributor

favoretti commented Dec 10, 2020

@dhirschfeld I wasn't able to do anything on this since deprecation, there was a chat on how to approach this, but didn't get further than that. If you could elaborate on your use-case for this - it might help HC folks to weigh in on re-introducing at least the taints that API allows for now.

Also, what would be an argument against leaving default node pool small for kube-system purpose workloads and creating an additional one that one can taint with anything?

@dhirschfeld
Copy link
Contributor

Also, what would be an argument against leaving default node pool small for kube-system purpose workloads and creating an additional one that one can taint with anything?

Isn't that the exact purpose of CriticalAddonsOnly=true:NoSchedule on the default node pool?

i.e. a taint stops you from scheduling pods on a node (unless they have a tolerance). If there is no taint on the default node pool then IIUC there's nothing stopping user pods from being scheduled on that node pool which is what I'd like.

I think you could get the same effect by defining an anti-affinity for the system node pool on every pod but that seems like a lot of boilerplate and if you forgot to do it then the pod could again be scheduled on a system node.

Disclaimer: I'm not a k8s expert so I might be completely off track!
just trying to apply best practice (separating system from user nodes/pods) as defined in:
https://docs.microsoft.com/en-us/azure/aks/operator-best-practices-advanced-scheduler

@favoretti
Copy link
Contributor

@tombuildsstuff Any thoughts on this? I could re-add taints back with validation that allows CriticalAddonsOnly=true:NoSchedule only?

@bgarcial
Copy link

@dhirschfeld

I wasn't sure if this was supposed to work with the latest 2.39.0 but it appears not:

For westeurope and 2.40.0 azurerm provider as well

Error: expanding `default_node_pool`: The AKS API has removed support for tainting all nodes in the default node pool and it is no longer possible to configure this. To taint a node pool, create a separate one

@luddskunk
Copy link

Hi,

Do we have any updates on allowing the exception back? @favoretti or @tombuildsstuff

As a workaround for anyone else reading this, I rolled back to v 2.34 now and it works to set taint on the default node pool.

As per comment from Microsoft employee in this thread.

@bcchagas
Copy link

bcchagas commented Jan 24, 2021

Hello, everyone!

@favoretti,
As @dhirschfeld said, in scenarios where not all applications have toleration/affinity, not having CriticalAddonsOnly=true:NoSchedule" taint means we have a system nodepool where non-system pods might get scheduled on due to lack of affinity/toleration.

As @luddskunk, I'm stuck with v2.34.

@favoretti
Copy link
Contributor

I'm happy to add it back, but I am not the one to decide on it. @tombuildsstuff you ok with me re-adding this back?

@lgmorand
Copy link

@favoretti @tombuildsstuff My customers needs it. they can't be stuck with 2.34 :'(

@favoretti
Copy link
Contributor

So, PR is there, the rest is up to HC :)

@samuelgautier
Copy link

This is an essential feature to keep the default node pool clean ! We look forward to seeing this PR validated. Thanks in advance.

@tombuildsstuff tombuildsstuff added this to the v2.45.0 milestone Jan 27, 2021
@tombuildsstuff tombuildsstuff modified the milestones: v2.45.0, v2.46.0 Jan 28, 2021
@tombuildsstuff tombuildsstuff modified the milestones: v2.46.0, v2.47.0 Feb 4, 2021
@samuelgautier
Copy link

Too bad! I can't wait! See you on February 11th then! Thank you ;)

katbyte pushed a commit that referenced this issue Feb 10, 2021
…abled` property (#10307)

Co-authored-by: Tom Harvey <tombuildsstuff@users.noreply.github.com>

Fixes #9183
@ghost
Copy link

ghost commented Feb 11, 2021

This has been released in version 2.47.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.47.0"
}
# ... other configuration ...

@robinmanuelthiel
Copy link

To avoid confusion, it might be worth noting that this fix does not allow setting the CriticalAddonsOnly taint as asked in the original question of this issue like this...

default_node_pool {
  node_taints = ["CriticalAddonsOnly=true:PreferNoSchedule"] # <- still not supported, use solution below
}

... but introduces a new only_critical_addons_enabled argument to the default_node_pool block. So adding the CriticalAddonsOnly taint now works like this:

default_node_pool {
  only_critical_addons_enabled = true # supported
}

@ghost
Copy link

ghost commented Mar 13, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Mar 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants