Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-29975: Allow multiple machine networks #6071

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

zaneb
Copy link
Member

@zaneb zaneb commented Mar 11, 2024

Until now, assisted-service has assumed that there would be either exactly one MachineNetwork specified for single-stack clusters, or for dual-stack clusters that there would be exactly one IPv4 and one IPv6 MachineNetwork specified (in that order).

None of these restrictions exist in OpenShift itself, which allows multiple MachineNetworks of each address family. This is necessary to support remote worker nodes on day 1, as well as distributed installations of a kind that are common in UPI deployments (where an external load balancer can easily balance across hosts in separate networks). (OCPBUGS-29975)

This change removes the assumptions about a single MachineNetwork per address family for clusters with UserManagedNetworking enabled, and reverts the change in #4867 that prevents users from specifying more networks at the API level.

(Clusters without UserManagedNetworking will still use Layer 2 reachability checks in the belongs-to-majority-group host validation, which effectively prevents using remote worker nodes when creating these clusters. See OCPBUGS-30730.)

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • Agent-based installer
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 11, 2024
@openshift-ci-robot
Copy link

@zaneb: This pull request references Jira Issue OCPBUGS-29975, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Until now, assisted-service has assumed that there would be either exactly one MachineNetwork specified for single-stack clusters, or for dual-stack clusters that there would be exactly one IPv4 and one IPv6 MachineNetwork specified (in that order).

None of these restrictions exist in OpenShift itself, which allows multiple MachineNetworks of each address family. This is necessary to support remote worker nodes on day 1, as well as distributed installations of a kind that are common in UPI deployments (where an external load balancer can easily balance across hosts in separate networks). (OCPBUGS-29975)

This change removes the assumptions about a single MachineNetwork per address family for clusters with UserManagedNetworking enabled, and reverts the change in #4867 that prevents users from specifying more networks at the API level.

(Clusters without UserManagedNetworking will still use Layer 2 reachability checks in the belongs-to-majority-group host validation, which effectively prevents using remote worker nodes when creating these clusters. See OCPBUGS-30730.)

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • Agent-based installer
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Mar 11, 2024
@zaneb zaneb changed the title OCPBUGS-29975: Support multiple machine networks OCPBUGS-29975: Allow multiple machine networks Mar 11, 2024
@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 11, 2024
@openshift-ci openshift-ci bot requested review from adriengentil and tsorya March 11, 2024 01:10
@zaneb
Copy link
Member Author

zaneb commented Mar 11, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 11, 2024
@openshift-ci-robot
Copy link

@zaneb: This pull request references Jira Issue OCPBUGS-29975, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @mhanss

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from mhanss March 11, 2024 01:11
@zaneb
Copy link
Member Author

zaneb commented Mar 11, 2024

Infra issues with the CentOS repos
/retest

Copy link

codecov bot commented Mar 11, 2024

Codecov Report

Attention: Patch coverage is 52.77778% with 68 lines in your changes missing coverage. Please review.

Project coverage is 68.18%. Comparing base (d295de7) to head (b9ba27b).
Report is 29 commits behind head on master.

Files with missing lines Patch % Lines
internal/network/cidr_validations.go 8.82% 31 Missing ⚠️
internal/network/machine_network_cidr.go 77.41% 9 Missing and 5 partials ⚠️
internal/network/dual_stack_validations.go 0.00% 11 Missing ⚠️
internal/provider/baremetal/installConfig.go 0.00% 7 Missing ⚠️
internal/bminventory/inventory.go 75.00% 1 Missing and 1 partial ⚠️
internal/cluster/validations/validations.go 0.00% 1 Missing ⚠️
internal/provider/nutanix/installConfig.go 0.00% 1 Missing ⚠️
internal/provider/vsphere/installConfig.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6071      +/-   ##
==========================================
- Coverage   68.19%   68.18%   -0.02%     
==========================================
  Files         279      279              
  Lines       39282    39288       +6     
==========================================
  Hits        26789    26789              
- Misses      10060    10068       +8     
+ Partials     2433     2431       -2     
Files with missing lines Coverage Δ
internal/cluster/validator.go 95.87% <100.00%> (+0.51%) ⬆️
internal/host/validator.go 82.49% <100.00%> (+0.19%) ⬆️
internal/cluster/validations/validations.go 13.73% <0.00%> (+0.13%) ⬆️
internal/provider/nutanix/installConfig.go 0.00% <0.00%> (ø)
internal/provider/vsphere/installConfig.go 0.00% <0.00%> (ø)
internal/bminventory/inventory.go 70.88% <75.00%> (-0.03%) ⬇️
internal/provider/baremetal/installConfig.go 42.35% <0.00%> (-0.51%) ⬇️
internal/network/dual_stack_validations.go 0.00% <0.00%> (ø)
internal/network/machine_network_cidr.go 62.32% <77.41%> (+1.73%) ⬆️
internal/network/cidr_validations.go 44.76% <8.82%> (-5.78%) ⬇️

@zaneb zaneb force-pushed the multiple-machine-networks branch from 68f0945 to b996310 Compare March 11, 2024 09:49
Copy link
Contributor

@ori-amizur ori-amizur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what are the restrictions here:
Do we allow multiple API VIPs per address family?
Do we have restriction per host-role to belong to a specific machine-network? If yes, this needs to be added as validation.
What if we have only 3 host cluster (only masters). Can such cluster have multiple IPv4 machine-networks?
Can a machine-network be stale (without hosts)?

@@ -21,7 +22,7 @@ const MinSNOMachineMaskDelta = 1
func parseCIDR(cidr string) (ip net.IP, ipnet *net.IPNet, err error) {
ip, ipnet, err = net.ParseCIDR(cidr)
if err != nil {
err = errors.Wrapf(err, "Failed to parse CIDR '%s'", cidr)
err = fmt.Errorf("Failed to parse CIDR '%s': %w", cidr, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this was changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a pain to import both the standard library "errors" package and the deprecated old-timey hack "github.com/pkg/errors" package in the same file. This is a trivial refactor, the result is the same.

if err != nil {
return err
}

if overlap {
return errors.Errorf("CIDRS %s and %s overlap", aCidrStr, bCidrStr)
return fmt.Errorf("CIDRS %s and %s overlap", aCidrStr, bCidrStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change the errors unless necessary

log.WithError(err).Warnf("Verify VIPs")
return common.NewApiError(http.StatusBadRequest, err)
}
}

} else {
primaryMachineNetworkCidr, err = network.CalculateMachineNetworkCIDR(network.GetApiVipById(&targetConfiguration, 0), network.GetIngressVipById(&targetConfiguration, 0), cluster.Hosts, matchRequired)
primaryMachineNetworkCidr, err := network.CalculateMachineNetworkCIDR(network.GetApiVipById(&targetConfiguration, 0), network.GetIngressVipById(&targetConfiguration, 0), cluster.Hosts, matchRequired)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be wrong according the concept presented by this PR, because we may have to machine-network - one per vip and one of both vips.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment indicating that this is something we'll want to fix as part of OCPBUGS-30730.

if err := checkCidrsOverlapping(c.cluster); err != nil {
return ValidationFailure, fmt.Sprintf("CIDRS Overlapping: %s.", err.Error())
if err := network.VerifyNoNetworkCidrOverlaps(c.cluster.ClusterNetworks, c.cluster.MachineNetworks, c.cluster.ServiceNetworks); err != nil {
return ValidationFailure, err.Error()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation error messages need to be agreed upon. They are presented by UI. Here it is not clear what is expected as text of this error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I did change them, you can see the change in the tests here: 51e76b1#diff-854f83029709261bb4e532a5a6839b51ddeb2beb2f6e921056016a74340d06a3

internal/host/validator.go Outdated Show resolved Hide resolved
return models.VipVerificationUnverified, errors.Errorf("%s <%s> cannot be set if Machine Network CIDR is empty", vipName, vip)
}
if !ipInCidr(vip, machineNetworkCidr) {
return models.VipVerificationFailed, errors.Errorf("%s <%s> does not belong to machine-network-cidr <%s>", vipName, vip, machineNetworkCidr)
if machineNetworkCidr == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is irrelevant. The machine network cannot be valid and empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can - validMachineNetwork will be true if any machine networks are defined, but machineNetworkCidr will only be set if the VIP is in one of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't need the test for valid-network

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maintaining the previous behaviour. If there are no machineNetworks specified, we return VipVerificationUnverified. If there are machineNetworks specified and the VIP isn't in any of them, we return VipVerificationFailed.

I think what's confusing in the diff is that machineNetworkCidr was previously a parameter, and now it's a local variable.

@zaneb zaneb force-pushed the multiple-machine-networks branch 2 times, most recently from c6e01dc to ff9be84 Compare March 12, 2024 10:12
@zaneb
Copy link
Member Author

zaneb commented Mar 12, 2024

Do we allow multiple API VIPs per address family?

No.

Do we have restriction per host-role to belong to a specific machine-network? If yes, this needs to be added as validation.

No.

What if we have only 3 host cluster (only masters). Can such cluster have multiple IPv4 machine-networks?

Yes (and in fact this is likely to be a common case for platform: none).

Can a machine-network be stale (without hosts)?

I don't see why not.

@ori-amizur
Copy link
Contributor

ori-amizur commented Mar 13, 2024

Do we have restriction per host-role to belong to a specific machine-network? If yes, this needs to be added as validation.

No.

So workers do not have to belong to machine-networks of ingress-vips? Maybe we have to verify that we have at least 2 workers belonging to machine-network of ingress VIPs ? If not it will cause a mess. The logic is already complicated and if we don't set rules, users will be confused.

What if we have only 3 host cluster (only masters). Can such cluster have multiple IPv4 machine-networks?

Yes (and in fact this is likely to be a common case for platform: none).

None platform does not have machine-networks at all (At least how we implemented it).
Anyway if we have multiple machine-networks for masters, how can the API VIPs move between these hosts?

Can a machine-network be stale (without hosts)?

I don't see why not.

Why is it needed?

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2024
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 13, 2024
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Aug 12, 2024
Copy link

openshift-ci bot commented Aug 12, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

@zaneb: This pull request references Jira Issue OCPBUGS-29975. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

In response to this:

Until now, assisted-service has assumed that there would be either exactly one MachineNetwork specified for single-stack clusters, or for dual-stack clusters that there would be exactly one IPv4 and one IPv6 MachineNetwork specified (in that order).

None of these restrictions exist in OpenShift itself, which allows multiple MachineNetworks of each address family. This is necessary to support remote worker nodes on day 1, as well as distributed installations of a kind that are common in UPI deployments (where an external load balancer can easily balance across hosts in separate networks). (OCPBUGS-29975)

This change removes the assumptions about a single MachineNetwork per address family for clusters with UserManagedNetworking enabled, and reverts the change in #4867 that prevents users from specifying more networks at the API level.

(Clusters without UserManagedNetworking will still use Layer 2 reachability checks in the belongs-to-majority-group host validation, which effectively prevents using remote worker nodes when creating these clusters. See OCPBUGS-30730.)

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • Agent-based installer
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@zaneb zaneb force-pushed the multiple-machine-networks branch 2 times, most recently from 3be57c4 to 4a89db6 Compare August 20, 2024 10:20
@zaneb
Copy link
Member Author

zaneb commented Aug 20, 2024

@ori-amizur I added a check in the VerifyVIPs that the machine network the API VIP is in is present on all master nodes, and the machine network the Ingress VIP is in is present on all worker nodes.
Note that for non-UserManagedNetworking, the l2_connectivity check is still going to require all the nodes to be on the same L2 network segment, effectively. There is a separate bug OCPBUGS-30730 for handling that, but this is a prerequisite for it.
For UserManagedNetworking clusters, hosts need only have l3_connectivity, so this change will allow them to pass multiple machine networks and have hosts connected to disjoint L2 domains, as long as they are routable. Since these clusters have no VIPs, this is what we want.

Note that in ABI we always pass the MachineNetworks provided in the install-config. Also, for dual-stack the user must always pass the MachineNetworks explicitly to assisted-service.

@zaneb zaneb force-pushed the multiple-machine-networks branch 2 times, most recently from 17570ee to 04cfd6d Compare August 20, 2024 11:19
@zaneb zaneb force-pushed the multiple-machine-networks branch from 04cfd6d to e1878e0 Compare September 3, 2024 05:28
@zaneb
Copy link
Member Author

zaneb commented Sep 3, 2024

Tests are working now, so I believe this is ready for another round of review.

@pawanpinjarkar
Copy link
Contributor

/cc @avishayt @ori-amizur could you take another look and comment if this is ready to go.

@zaneb zaneb force-pushed the multiple-machine-networks branch from e1878e0 to 924e8a1 Compare November 19, 2024 00:03
zaneb added 10 commits December 3, 2024 17:24
It's confusing that GetMachineNetworksFromBootstrapHost() returns the
existing MachineNetworks in the cluster (and does *not* get them from
the bootstrap host) if they already exist there. Refactor to make the
logic clearer.
The network type is set for all platforms in getBasicInstallConfig().
There is no need to set it again in the none platform provider.
None of the subnets specified in any of the machineNetworks,
clusterNetworks, or serviceNetworks should overlap. Validate all of
these combinations, as openshift-installer does, instead of making
assumptions about indices being aligned to address families.
Don't make assumptions about a 1:1 mapping between MachineNetworks and
VIPs. Check only that the VIP is a member of any MachineNetwork.
When doing cluster validations, check that all of the hosts that the VIP
can point to (i.e. control plane hosts for the API VIP, workers for the
Ingress VIP) are members of the VIP's MachineNetwork.

Since at the time of adding the VIPs (and also during cluster
validations) we only check that the VIP is a member of _some_
MachineNetwork, we need this additional check to ensure that it is one
where the hosts are.
It doesn't appear this was ever used outside of its own unit tests.
@zaneb zaneb force-pushed the multiple-machine-networks branch from 924e8a1 to eb9879e Compare December 3, 2024 04:25
Copy link

openshift-ci bot commented Dec 3, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zaneb
Once this PR has been reviewed and has the lgtm label, please assign ori-amizur for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zaneb added 5 commits December 3, 2024 17:41
In the belongs-to-machine-cidr validation, allow the host to be a member
of any MachineNetwork. In a dual-stack cluster, require it to be a
member of both an IPv4 and an IPv6 network.

Previously it was assumed that the only reason for multiple
MachineNetworks to appear was that a dual stack cluster could contain
exactly one IPv4 and one IPv6 MachineNetwork.
Multiple MachineNetworks in the same address family and IPv6-primary
dual-stack clusters are a thing, so relax the dual-stack validation
requirements for machine networks to allow them.
Allow users to specify multiple machine networks of the same address
family. This is a documented and supported feature of OpenShift.

This reverts commit 873dd81.
Don't restrict ourselves to the first machine network when looking for
an interface on a machine network to set the BootMACAddress.
OpenShift has ~always supported having machines in multiple
machineNetworks, so update the TODO comment to reflect that accounting
for this is already something we need to do to fully support
non-UserManagedNetworking clusters. (UserManagedNetworking clusters use
only the L3 connectivity check.)

See https://issues.redhat.com/browse/OCPBUGS-30730 for more details.
@zaneb zaneb force-pushed the multiple-machine-networks branch from eb9879e to b9ba27b Compare December 3, 2024 04:41
Copy link

openshift-ci bot commented Dec 5, 2024

@zaneb: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-metal-assisted-none ff9be84 link false /test edge-e2e-metal-assisted-none
ci/prow/edge-e2e-metal-assisted-mtv-4-17 e1878e0 link true /test edge-e2e-metal-assisted-mtv-4-17
ci/prow/edge-e2e-nutanix-assisted b9ba27b link false /test edge-e2e-nutanix-assisted
ci/prow/edge-e2e-nutanix-assisted-4-14 b9ba27b link false /test edge-e2e-nutanix-assisted-4-14
ci/prow/edge-subsystem-aws b9ba27b link true /test edge-subsystem-aws
ci/prow/edge-subsystem-kubeapi-aws b9ba27b link true /test edge-subsystem-kubeapi-aws
ci/prow/okd-scos-e2e-aws-ovn b9ba27b link false /test okd-scos-e2e-aws-ovn
ci/prow/edge-e2e-metal-assisted b9ba27b link true /test edge-e2e-metal-assisted
ci/prow/edge-unit-test b9ba27b link true /test edge-unit-test
ci/prow/edge-e2e-vsphere-assisted b9ba27b link false /test edge-e2e-vsphere-assisted
ci/prow/edge-e2e-oci-assisted b9ba27b link false /test edge-e2e-oci-assisted
ci/prow/e2e-agent-compact-ipv4 b9ba27b link true /test e2e-agent-compact-ipv4
ci/prow/edge-e2e-oci-assisted-4-14 b9ba27b link false /test edge-e2e-oci-assisted-4-14
ci/prow/edge-e2e-ai-operator-ztp b9ba27b link true /test edge-e2e-ai-operator-ztp
ci/prow/edge-e2e-metal-assisted-osc-sno-4-17 b9ba27b link true /test edge-e2e-metal-assisted-osc-sno-4-17
ci/prow/edge-e2e-metal-assisted-osc-4-17 b9ba27b link true /test edge-e2e-metal-assisted-osc-4-17

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Context("Cluster with two networks of same stack", func() {
It("only v4 in cluster networks rejected", func() {
errStr := "Second cluster network has to be IPv6 subnet"
params.NewClusterParams.ClusterNetworks = []*models.ClusterNetwork{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because ClusterNetworks is an array that can theoretically contain any number of entries.

Is the maximum network count we expect 2?

If so, then should we have a test to ensure that only 2 networks are provided.

If not, then should we be assuming that we are dealing with a "Second" cluster network or should this test in fact check that at least one of the cluster networks has an IPv6 subnet and at least one is IPv4?

It("only v4 in service networks rejected", func() {
errStr := "Second service network has to be IPv6 subnet"
params.NewClusterParams.ClusterNetworks = common.TestDualStackNetworking.ClusterNetworks
params.NewClusterParams.ServiceNetworks = []*models.ServiceNetwork{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because ServiceNetworks is an array that can theoretically contain any number of entries.

Is the maximum network count we expect 2?

If so, then should we have a test to ensure that only 2 networks are provided.

If not, then should we be assuming that we are dealing with a "Second" service network or should this test in fact check that at least one of the service networks has an IPv6 subnet and at least one is IPv4?

@@ -4234,7 +4234,7 @@ var _ = Describe("Refresh Host", func() {
ntpSources: defaultNTPSources,
role: models.HostRoleMaster,
statusInfoChecker: makeValueChecker(formatStatusInfoFailedValidation(statusInfoNotReadyForInstall,
"Host does not belong to machine network CIDRs. Verify that the host belongs to every CIDR listed under machine networks")),
"Host does not belong to machine network CIDRs. Verify that the host belongs to a listed machine network CIDR for each IP stack in use")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding something like for each IP stack (IPv4/IPv6) in use so that the user clearly understands the meaning of 'IP stack'

func VerifyMachineNetworksDualStack(networks []*models.MachineNetwork, isDualStack bool) error {
if !isDualStack {
return nil
}
if len(networks) != 2 {
if len(networks) < 2 {
return errors.Errorf("Expected 2 machine networks, found %d", len(networks))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Expected at least 2 machine networks and at least one for each IP stack (IPv4, IPv6), found %d"

vipsWrapper.Verification(i), v.log)
failed = failed || verification != models.VipVerificationSucceeded
if verification == models.VipVerificationSucceeded {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a comment to explain what is going on here, looks like you are checking host networks after failing to find the VIP in machine networks.

If so, I think we should have a comment.

verification, err = network.ValidateVipInHostNetworks(c.cluster.Hosts, c.cluster.MachineNetworks, vipsWrapper.IP(i), vipsWrapper.Type(), v.log)
failed = failed || verification != models.VipVerificationSucceeded
} else {
failed = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should skip the else here and assume failure unless disproved in the machine network and cluster network checks.

Failed = true should be the default value before any checks have been performed and we should be aiming to set this false in subsequent tests.

machineNetworks []*models.MachineNetwork,
serviceNetworks []*models.ServiceNetwork) error {
errs := []error{}
for imn, mn := range machineNetworks {
Copy link
Contributor

@paul-maidment paul-maidment Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High time complexity here O(m*n)
There might be something clever we could do to reduce this.

Different language I know, but there are approaches using a data structure to memorize already searched ranges and to allow iteration across a set of networks in Order(n)

Note the use of a Radix tree in this implementation.

https://github.com/fgiuba/ipconflict/blob/master/ipconflict/subnet.py#L46-L61

Due to the limited number of networks, the consequences of not handling this may not be problematic. But it does look like something could be improved here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants