Implement the controller for API BGPPolicy #6203

hongliangl · 2024-04-09T02:45:02Z

This commit implements the controller of API BGPPolicy, designed to advertise
Service IPs, Egress IPs, and Pod IPs to BGP peers from selected Kubernetes Nodes.

According to the spec of BGPPolicy, the Node selector is used to select Nodes
to which a BGPPolicy is applied. Multiple BGPPolicies can be applied to the
same Node. However, only the oldest BGPPolicy will be effective on a Node,
with others serving as alternatives. The effective one may be changed in the
following cases:

The current effective BGPPolicy is updated and not applied to the Node.
The current effective BGPPolicy is deleted.

The BGP server instance is only created and started for the effective BGPPolicy on
a Node. If the effective BGPPolicy is changed, the corresponding BGP server instance
will be terminated by calling the Stop method, and a new BGP server instance will
be created and started by calling the Start method for the new effective BGPPolicy.

To create a BGP server instance, ASN, router ID, and listen port must be specified.
The ASN and listen port are specified in the spec of the effective BGPPolicy. For router ID,
if the Kubernetes cluster is IPv4-only or dual-stack, we use the Node's IPv4 address
as the router ID, ensuring uniqueness. If the Kubernetes cluster is IPv6-only, where no
Node IPv4 address is available, the router ID could be specified via the Node annotation
node.antrea.io/bgp-router-id. If not present, a router ID will be generated by hashing
the Node name and update it to the Node annotation node.antrea.io/bgp-router-id.
Additionally, the stale BGP server instance will be terminated and a new BGP server
instance should be created and started when any of ASN, routerID, or listen port changes.

The information of the BGP peers is specified in the effective BGPPolicy. The unique
identification of a BGP peer is the peer IP address and peer ASN.

To reconcile the latest BGP peers:

Get the BGP peers to be added and add them by calling the AddPeer method of the
BGP server instance.
Get the BGP peers to be deleted and delete them by calling the RemovePeer method
of the BGP server instance.
Get the remaining BGP peers and calculate the updated BGP peers, then update them by
calling the UpdatePeer method of the BGP server instance.

The information of the IPs to be advertised can be calculated from the spec of the
effective BGPPolicy. Currently, we advertise the IPs and CIDRs to all the BGP peers.

To reconcile the latest IPs to all BGP peers:

If the BGP server instance is newly created and started, advertise all the IPs by
calling the AdvertiseRoutes method.
If the BGP server instance is not newly created and started:
- Get the IPs/CIDRs to be added and advertise them by calling the AdvertiseRoutes method.
- Get the IPs/CIDRs to be removed and withdraw them by calling the WithdrawRoutes method.

The feature is gated by the alpha BGPPolicy feature gate and only supported in Linux.

BGPPolicy API Add BGPPolicy API #6009
goBGP integration tests Add BGP datapath interface and implement goBGP integration #6447
goBGP logger Add BGP datapath interface and implement goBGP integration #6447
e2e tests Add e2e tests for BGPPolicy #6523 Support deploying one FRR container in Kind network #6488
documents Add document for BGPPolicy #6524 (Implement the controller for API BGPPolicy #6203 (comment))
unit tests
refine code comments

pkg/agent/controller/bgp/controller.go

luolanzone · 2024-07-23T03:21:55Z

pkg/agent/controller/bgp/controller.go

+			return "", fmt.Errorf("failed to patch BGP router ID to Node annotation %s: %w", types.NodeBGPRouterIDAnnotationKey, err)
+		}
+	} else if !utilnet.IsIPv4String(routerID) {
+		return "", fmt.Errorf("BGP router ID should be an IPv4 address string")


Should we also correct the routeID with the annotation NodeBGPRouterIDAnnotationKey in this case? Considering it's owned by Antrea but there might be a case that users modified it accidentally.

I don't think we are obligated to correct that since that's what the users should do by themselves. cc @tnqn

Logically we shouldn't update user input which would cause unexpected results. If users ever set this annotation, they want to specify the router ID to a value which is more meaningful for them, instead of using the system generated one. If they set a wrong value and we override it to a correct but different value, they wouldn't know they input a invalid ID and would think this is not configurable.

On the other hand, right now there is no good signal that the value is invalid and cannot be used. I was wondering if we should update the annotation value to something like "<invalid value provided, please use a valid IPv4 address>". But I don't feel very strongly about it.

I think it's fine as long as there is error log about it given this should only be used by advanced users who are familiar with BGP. And this is not the only case, the same situation applies to NPL, ServiceExternalIP, Service, Egress which takes configuration via user-provided annotation. Antrea users seem careful enough that we have never received bugs related to misconfigured annotations on other features :)

pkg/agent/controller/bgp/controller.go

luolanzone · 2024-07-23T03:33:24Z

pkg/agent/controller/bgp/controller.go

+		if c.enabledIPv4 && utilnet.IsIPv4String(ip) {
+			allRoutes.Insert(bgp.Route{Prefix: ip + ipv4Suffix})
+		}
+		if c.enabledIPv6 && utilnet.IsIPv6String(ip) {
+			allRoutes.Insert(bgp.Route{Prefix: ip + ipv6Suffix})
+		}


Maybe wrap this as 'UpdateAllRoutes' and reused in the following function.

luolanzone · 2024-07-23T03:44:05Z

pkg/agent/controller/bgp/controller.go

+	if node.GetName() != c.nodeName {
+		return
+	}
+	if reflect.DeepEqual(node.GetLabels(), oldNode.GetLabels()) &&


Why do we check labels and annotations only? it reminds me that the case for Flexible IPAM, Do we plan to support this? I suppose it's a valid case for BGP?
@tnqn @antoninbas

I may be wrong but my understanding is Flexible IPAM is another way to address the problem BGP tries to resolve, by dividing the Node network into multiple flat VLAN subnets and routing between them via a router, so BGP is not required here.

It needs to check labels and annotations because they affect whether the policy applies to the node and the node router ID.

I didn't think about it too much but it's an interesting question. I agree with Quan's observation for the current version of Antrea Flexible IPAM.
However, if we interpret "Flexible IPAM" as the ability to assign arbitrary IPs to Pods, without looking at the current implementation (which requires bridging mode), then maybe this is something that BGP can actually help achieve (when supported by the underlay) without requiring that the Node's transport interface be moved to the OVS bridge? What do you think?

Note: that would be a L3 approach to Flexible IPAM, not a L2 approach using VLANs

I re-opened this conversation not because it impacts this PR, but because I think it's a good thing to discuss

Right, we should decouple IPAM and implementation. I agree BGPPolicy can help achieve Flexible IPAM without briding mode. Perhaps we can track it via an issue. If my understanding is correct, it may require the BGP processes running on Nodes exchange routes, not only advertise routes?

If my understanding is correct, it may require the BGP processes running on Nodes exchange routes, not only advertise routes?

I would say probably, yes. And that's what the Calico BGP implementation does I think.
I don't know if it is strictly required. If we keep the current approach (Node IPAM + Flexible IPAM, with the flexibility to choose between the 2 for different workloads), then I can see how just disabling SNAT could be enough, assuming that for Pods using Flexible IPs the default route is used and traffic gets to a router to which Flexible IPs were advertised. But for more generic support, and for the ability to send traffic to the destination Node directly (e.g. when 2 Nodes are on the same subnet) I would say being able to learn routes over BGP is necessary.

hongliangl · 2024-07-23T04:34:22Z

/test-all

pkg/agent/controller/bgp/controller.go

antoninbas · 2024-07-23T21:59:11Z

pkg/agent/controller/bgp/controller.go

+	queue workqueue.RateLimitingInterface
+}
+
+func NewBGPPolicyController(ctx context.Context,


I find it a bit strange to pass the context in the New function. Would it make more sense to set this field in the Run method?

Done. Additionally, I have deleted ctx context.Context from struct Controller, and the ctx will be passed through function parameter to methods of goBGP interface, not a member of Controller.

pkg/agent/controller/bgp/controller.go

antoninbas · 2024-07-23T22:01:36Z

pkg/agent/controller/bgp/controller.go

+	bgpServer := c.bgpPolicyState.bgpServer
+	for key := range peerToAddKeys {
+		peerConfig := curPeerConfigs[key]
+		if err := bgpServer.AddPeer(c.ctx, peerConfig); err != nil {
+			return err
+		}
+		c.bgpPolicyState.peerConfigs[key] = peerConfig
+	}
+	for key := range peerToUpdateKeys {
+		peerConfig := curPeerConfigs[key]
+		if err := bgpServer.UpdatePeer(c.ctx, peerConfig); err != nil {
+			return err
+		}
+		c.bgpPolicyState.peerConfigs[key] = peerConfig
+	}
+	for key := range peerToDeleteKeys {
+		peerConfig := prePeerConfigs[key]
+		if err := bgpServer.RemovePeer(c.ctx, peerConfig); err != nil {
+			return err
+		}
+		delete(c.bgpPolicyState.peerConfigs, key)


Instead of using c.ctx directly for the add / update / remove peer operations, I think it may make sense to create a child context with a reasonable timeout using https://pkg.go.dev/context#WithTimeout

I'm agreed that it will be better to have a reasonable timeout for calling add/update/remove peer operations, but I'm not sure about how to decide timeout.

pkg/agent/types/bgppolicy.go

pkg/agent/controller/bgp/controller.go

antoninbas · 2024-07-24T17:36:09Z

pkg/agent/controller/bgp/controller.go

+	}
+}
+
+func (c *Controller) Run(ctx context.Context, stopCh <-chan struct{}) {


I think it's an anti-pattern to pass both a context and a stop channel to the same function. They also "mean" the same thing as the context is generated from the stop channel in this case, and will be cancelled when the channel is closed.

I would recommend only keeping the context in this case, and you can call ctx.Done() when you need a channel.
You can also do the opposite (keep the channel) and use wait.ContextForChannel to get a context. But I would say keeping the context is probably a more "modern" / idiomatic approach.

Appreciate the detailed explanation!

I would recommend only keeping the context in this case, and you can call ctx.Done() when you need a channel.

I prefer this. It looks neater and more "modern".

pkg/agent/controller/bgp/controller.go

antoninbas · 2024-07-24T17:40:57Z

pkg/agent/controller/bgp/controller.go

+		slices.Equal(oldSvc.Spec.ExternalIPs, svc.Spec.ExternalIPs) &&
+		slices.Equal(getIngressIPs(oldSvc), getIngressIPs(svc)) &&
+		oldSvc.Spec.ExternalTrafficPolicy == svc.Spec.ExternalTrafficPolicy &&
+		reflect.DeepEqual(oldSvc.Spec.InternalTrafficPolicy, svc.Spec.InternalTrafficPolicy) {


nit: It feels like a bit of a waste to use reflection to compare 2 *type values, when type is a primitive type. You could use https://pkg.go.dev/k8s.io/utils/ptr#Equal

I am not sure if this comment applies to other locations as well

Here ptr.Equal(oldSvc.Spec.InternalTrafficPolicy, svc.Spec.InternalTrafficPolicy) looks much better.

For other locations using reflect, there are pointers within the struct to compare, so I didn't change them.

antoninbas · 2024-07-24T18:00:14Z

pkg/agent/controller/bgp/controller.go

+	if node.GetName() != c.nodeName {
+		return
+	}
+	if reflect.DeepEqual(node.GetLabels(), oldNode.GetLabels()) &&


I didn't think about it too much but it's an interesting question. I agree with Quan's observation for the current version of Antrea Flexible IPAM.
However, if we interpret "Flexible IPAM" as the ability to assign arbitrary IPs to Pods, without looking at the current implementation (which requires bridging mode), then maybe this is something that BGP can actually help achieve (when supported by the underlay) without requiring that the Node's transport interface be moved to the OVS bridge? What do you think?

Note: that would be a L3 approach to Flexible IPAM, not a L2 approach using VLANs

pkg/agent/controller/bgp/controller.go

pkg/agent/controller/bgp/controller_test.go

luolanzone · 2024-07-25T09:00:17Z

@hongliangl Thanks a lot for the quick response, @antoninbas @tnqn could you take another look? We plan to release 2.1 tomorrow, need to merge this as early as we can to move forward. Thanks!
btw, can we defer the e2e PR #6523 if there is no enough time to close it?

pkg/agent/controller/bgp/controller.go

tnqn · 2024-07-25T14:33:44Z

pkg/agent/controller/bgp/controller.go

+	go wait.UntilWithContext(ctx, c.worker, time.Second)
+
+	go c.secretInformer.Run(ctx.Done())
+	cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced)


Shouldn't it be waited before starting the worker, along with other informers? otherwise the worker would try to connect peer routers without providing any passwords first.

Make sense. Moved these two lines before go wait.UntilWithContext(ctx, c.worker, time.Second)

tnqn · 2024-07-25T14:48:39Z

pkg/agent/controller/bgp/controller.go

+
+	cacheSyncs := []cache.InformerSynced{
+		c.nodeListerSynced,
+		c.serviceListerSynced,
+		c.bgpPolicyListerSynced,
+		c.endpointSliceListerSynced,
+		c.serviceListerSynced,
+	}
+	if c.egressEnabled {
+		cacheSyncs = append(cacheSyncs, c.egressListerSynced)
+	}
+	if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) {
+		return
+	}
+
+	go c.secretInformer.Run(ctx.Done())
+	cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced)


Suggested change

cacheSyncs := []cache.InformerSynced{

c.nodeListerSynced,

c.serviceListerSynced,

c.bgpPolicyListerSynced,

c.endpointSliceListerSynced,

c.serviceListerSynced,

}

if c.egressEnabled {

cacheSyncs = append(cacheSyncs, c.egressListerSynced)

}

if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) {

return

}

go c.secretInformer.Run(ctx.Done())

cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced)

go c.secretInformer.Run(ctx.Done())

cacheSyncs := []cache.InformerSynced{

c.nodeListerSynced,

c.serviceListerSynced,

c.bgpPolicyListerSynced,

c.endpointSliceListerSynced,

c.serviceListerSynced,

c.secretInformer.HasSynced,

}

if c.egressEnabled {

cacheSyncs = append(cacheSyncs, c.egressListerSynced)

}

if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) {

return

}

This commit implements the controller of API `BGPPolicy`, designed to advertise Service IPs, Egress IPs, and Pod IPs to BGP peers from selected Kubernetes Nodes. According to the spec of `BGPPolicy`, the Node selector is used to select Nodes to which a `BGPPolicy` is applied. Multiple `BGPPolicies` can be applied to the same Node. However, only the oldest `BGPPolicy` will be effective on a Node, with others serving as alternatives. The effective one may be changed in the following cases: - The current effective BGPPolicy is updated and not applied to the Node. - The current effective BGPPolicy is deleted. The BGP server instance is only created and started for the effective BGPPolicy on a Node. If the effective BGPPolicy is changed, the corresponding BGP server instance will be terminated by calling the `Stop` method, and a new BGP server instance will be created and started by calling the `Start` method for the new effective BGPPolicy. To create a BGP server instance, ASN, router ID, and listen port must be specified. The ASN and listen port are specified in the spec of the effective BGPPolicy. For router ID, if the Kubernetes cluster is IPv4-only or dual-stack, we use the Node's IPv4 address as the router ID, ensuring uniqueness. If the Kubernetes cluster is IPv6-only, where no Node IPv4 address is available, the router ID could be specified via the Node annotation `node.antrea.io/bgp-router-id`. If not present, a router ID will be generated by hashing the Node name and update it to the Node annotation `node.antrea.io/bgp-router-id`. Additionally, the stale BGP server instance will be terminated and a new BGP server instance should be created and started when any of ASN, routerID, or listen port changes. The information of the BGP peers is specified in the effective BGPPolicy. The unique identification of a BGP peer is the peer IP address and peer ASN. To reconcile the latest BGP peers: - Get the BGP peers to be added and add them by calling the `AddPeer` method of the BGP server instance. - Get the BGP peers to be deleted and delete them by calling the `RemovePeer` method of the BGP server instance. - Get the remaining BGP peers and calculate the updated BGP peers, then update them by calling the `UpdatePeer` method of the BGP server instance. The information of the IPs to be advertised can be calculated from the spec of the effective BGPPolicy. Currently, we advertise the IPs and CIDRs to all the BGP peers. To reconcile the latest IPs to all BGP peers: - If the BGP server instance is newly created and started, advertise all the IPs by calling the `AdvertiseRoutes` method. - If the BGP server instance is not newly created and started: - Get the IPs/CIDRs to be added and advertise them by calling the `AdvertiseRoutes` method. - Get the IPs/CIDRs to be removed and withdraw them by calling the `WithdrawRoutes` method. The feature is gated by the alpha `BGPPolicy` feature gate and only supported in Linux. Signed-off-by: Hongliang Liu <lhongliang@vmware.com>

tnqn

LGTM

antoninbas

LGTM, thanks for the great contribution!

antoninbas · 2024-07-25T20:22:02Z

pkg/agent/controller/bgp/controller_test.go

+	return ""
+}
+
+func waitAndGetDummyEvent(t *testing.T, c *fakeController) {


nit: for a future PR, it may be good to rename this to waitEvent and doneDummyEvent could be renamed to removeEvent. They could also be defined as methods on fakeController.

antoninbas · 2024-07-25T20:22:46Z

/test-all

hongliangl force-pushed the 20240222-bgp-controller-v2 branch from e986dd8 to 16f2901 Compare April 9, 2024 02:46

hongliangl added kind/feature Categorizes issue or PR as related to a new feature. action/release-note Indicates a PR that should be included in release notes. labels Apr 9, 2024

hongliangl force-pushed the 20240222-bgp-controller-v2 branch from e93d476 to 7508c38 Compare April 10, 2024 07:20

rajnkamr added the area/transit/bgp Issues or PRs related to BGP support. label Apr 11, 2024

hongliangl force-pushed the 20240222-bgp-controller-v2 branch from 7508c38 to e1e246a Compare May 22, 2024 05:30

hongliangl added this to the Antrea v2.1 release milestone May 24, 2024

hongliangl force-pushed the 20240222-bgp-controller-v2 branch 2 times, most recently from f209e2a to 3063737 Compare May 30, 2024 10:40

hongliangl force-pushed the 20240222-bgp-controller-v2 branch 3 times, most recently from 57773d9 to c9b727a Compare June 6, 2024 12:25

hongliangl mentioned this pull request Jun 14, 2024

Add BGP datapath interface and implement goBGP integration #6447

Merged

hongliangl force-pushed the 20240222-bgp-controller-v2 branch 11 times, most recently from 095d7da to 5086b34 Compare June 19, 2024 07:23

hongliangl marked this pull request as ready for review June 19, 2024 07:26

hongliangl force-pushed the 20240222-bgp-controller-v2 branch 2 times, most recently from 3bab09b to 69c31a2 Compare June 20, 2024 05:55

hongliangl changed the title ~~Add BGPPolicy controller~~ Implement the controller for API BGPPolicy Jun 20, 2024

hongliangl force-pushed the 20240222-bgp-controller-v2 branch from 69c31a2 to 18c6f27 Compare June 21, 2024 10:38

hongliangl mentioned this pull request Jun 21, 2024

To support BGP stack integration #6189

Closed

4 tasks

hongliangl requested a review from tnqn July 23, 2024 03:17

luolanzone reviewed Jul 23, 2024

View reviewed changes

hongliangl force-pushed the 20240222-bgp-controller-v2 branch 2 times, most recently from 840d1ed to fbb40aa Compare July 23, 2024 13:38

antoninbas reviewed Jul 23, 2024

View reviewed changes

luolanzone reviewed Jul 24, 2024

View reviewed changes

pkg/agent/types/bgppolicy.go Outdated Show resolved Hide resolved

pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved

hongliangl force-pushed the 20240222-bgp-controller-v2 branch from fbb40aa to c52b745 Compare July 24, 2024 08:30

hongliangl requested review from antoninbas and luolanzone July 24, 2024 08:49

antoninbas reviewed Jul 24, 2024

View reviewed changes

hongliangl force-pushed the 20240222-bgp-controller-v2 branch 3 times, most recently from 6ba5320 to 9692014 Compare July 25, 2024 04:56

hongliangl force-pushed the 20240222-bgp-controller-v2 branch 3 times, most recently from c4c6fcb to d9cf1ee Compare July 25, 2024 13:40

tnqn reviewed Jul 25, 2024

View reviewed changes

pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved

tnqn reviewed Jul 25, 2024

View reviewed changes

hongliangl force-pushed the 20240222-bgp-controller-v2 branch from d9cf1ee to 787af7c Compare July 25, 2024 14:44

tnqn reviewed Jul 25, 2024

View reviewed changes

hongliangl force-pushed the 20240222-bgp-controller-v2 branch from 787af7c to 3d33c51 Compare July 25, 2024 15:02

tnqn approved these changes Jul 25, 2024

View reviewed changes

antoninbas approved these changes Jul 25, 2024

View reviewed changes

antoninbas mentioned this pull request Jul 25, 2024

BGP-based implementation for Flexible IPAM #6548

Open

antoninbas merged commit 905a8a6 into antrea-io:main Jul 25, 2024
55 of 58 checks passed

tnqn mentioned this pull request Jul 26, 2024

Add BGP support to the Antrea Agent #5948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the controller for API BGPPolicy #6203

Implement the controller for API BGPPolicy #6203

hongliangl commented Apr 9, 2024 •

edited

Loading

luolanzone Jul 23, 2024

hongliangl Jul 23, 2024

tnqn Jul 23, 2024

antoninbas Jul 23, 2024

tnqn Jul 24, 2024

luolanzone Jul 23, 2024

luolanzone Jul 23, 2024

tnqn Jul 23, 2024

antoninbas Jul 24, 2024

antoninbas Jul 24, 2024

tnqn Jul 25, 2024

antoninbas Jul 25, 2024

hongliangl commented Jul 23, 2024

antoninbas Jul 23, 2024

hongliangl Jul 24, 2024

antoninbas Jul 23, 2024

hongliangl Jul 24, 2024

antoninbas Jul 24, 2024

hongliangl Jul 25, 2024 •

edited

Loading

antoninbas Jul 24, 2024

hongliangl Jul 25, 2024 •

edited

Loading

antoninbas Jul 24, 2024

luolanzone commented Jul 25, 2024

tnqn Jul 25, 2024

hongliangl Jul 25, 2024

tnqn Jul 25, 2024

tnqn left a comment

antoninbas left a comment

antoninbas Jul 25, 2024

antoninbas commented Jul 25, 2024

Implement the controller for API BGPPolicy #6203

Implement the controller for API BGPPolicy #6203

Conversation

hongliangl commented Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongliangl commented Jul 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongliangl Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongliangl Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luolanzone commented Jul 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnqn left a comment

Choose a reason for hiding this comment

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoninbas commented Jul 25, 2024

hongliangl commented Apr 9, 2024 •

edited

Loading

hongliangl Jul 25, 2024 •

edited

Loading

hongliangl Jul 25, 2024 •

edited

Loading