Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for node slice IPAM #503

Merged
merged 2 commits into from
Oct 15, 2024

Conversation

xagent003
Copy link
Contributor

@xagent003 xagent003 commented Sep 13, 2024

This fixes some issues seen with the new node slice IPAM feature.

  1. Disable some controllers that are not needed. Don't need informer on nodeSlicePool since that is an internal CR we manage. Don't need to reconcile for NADs when the resource version does not change. For nodes, only listen for node add and delete. Nodes are updated frequently for Status changes and conditions that we don't need - we only want to allocate a new slice pool when a node is created, and remove its allocation when node is deleted

  2. We have multiple NADs(which map to multiple NICs), but with the same CIDR and network_name(because it is really one L2 network). With node slice pool feature enabled and a Pod requesting multiple networks, the same podRef and same containerID will be present multiple times in each IPpool, for each ifName(corresponding to each NAD). We also need to match by ifName so it deletes the correct entry, rather than first one

  3. When node slice size or cidr is reconfigured, it was passing the wrong range - ipam.Range, rather than ipamConf.IPRanges[0].Range. ipam.Range is cleared and set to ipamConf.IPRanges[0].Range so it was error'ing out with empty CIDR

  4. nodeSlice.Spec was not being written/saved when the node slice size/cidr was reconfigured

  5. the Subnet mask/cidr used was incorrect. We should use the CIDR from the NAD, rather than the range for the node. The NAD's range really defines the cluster-wide subnet, whereas the nodeSLicePool's IPRange is just used for grouping IP allocations. Without this fix, each node was on a different subnet, resulting in a IP lookup via default route(primary CNI), rather than going over the NAD/Multus network. For example, suppose our NAD range is 10.0.0.0/8. The node slice size is a /24. If we use the range from the NodeSLicePool, Pods on a node are on a different /24, rather than all being on the same /8.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 10855353642

Details

  • 8 of 24 (33.33%) changed or added relevant lines in 4 files are covered.
  • 8 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.4%) to 54.134%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/iphelpers/iphelpers.go 1 2 50.0%
pkg/storage/kubernetes/ipam.go 0 2 0.0%
pkg/allocate/allocate.go 0 4 0.0%
pkg/node-controller/controller.go 7 16 43.75%
Files with Coverage Reduction New Missed Lines %
pkg/node-controller/controller.go 8 57.85%
Totals Coverage Status
Change from base Build 10831942310: -0.4%
Covered Lines: 1434
Relevant Lines: 2649

💛 - Coveralls

@ivelichkovich
Copy link
Contributor

ivelichkovich commented Sep 13, 2024

/lgtm

Thank you!

cc @dougbtv

@ivelichkovich
Copy link
Contributor

fixes: #498

Copy link
Member

@dougbtv dougbtv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Arjun!!!

@dougbtv dougbtv merged commit 57d5ac3 into k8snetworkplumbingwg:master Oct 15, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants