-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the controller for API BGPPolicy #6203
Implement the controller for API BGPPolicy #6203
Conversation
e986dd8
to
16f2901
Compare
e93d476
to
7508c38
Compare
7508c38
to
e1e246a
Compare
f209e2a
to
3063737
Compare
57773d9
to
c9b727a
Compare
095d7da
to
5086b34
Compare
3bab09b
to
69c31a2
Compare
69c31a2
to
18c6f27
Compare
return "", fmt.Errorf("failed to patch BGP router ID to Node annotation %s: %w", types.NodeBGPRouterIDAnnotationKey, err) | ||
} | ||
} else if !utilnet.IsIPv4String(routerID) { | ||
return "", fmt.Errorf("BGP router ID should be an IPv4 address string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also correct the routeID with the annotation NodeBGPRouterIDAnnotationKey in this case? Considering it's owned by Antrea but there might be a case that users modified it accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are obligated to correct that since that's what the users should do by themselves. cc @tnqn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically we shouldn't update user input which would cause unexpected results. If users ever set this annotation, they want to specify the router ID to a value which is more meaningful for them, instead of using the system generated one. If they set a wrong value and we override it to a correct but different value, they wouldn't know they input a invalid ID and would think this is not configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, right now there is no good signal that the value is invalid and cannot be used. I was wondering if we should update the annotation value to something like "<invalid value provided, please use a valid IPv4 address>"
. But I don't feel very strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine as long as there is error log about it given this should only be used by advanced users who are familiar with BGP. And this is not the only case, the same situation applies to NPL, ServiceExternalIP, Service, Egress which takes configuration via user-provided annotation. Antrea users seem careful enough that we have never received bugs related to misconfigured annotations on other features :)
if c.enabledIPv4 && utilnet.IsIPv4String(ip) { | ||
allRoutes.Insert(bgp.Route{Prefix: ip + ipv4Suffix}) | ||
} | ||
if c.enabledIPv6 && utilnet.IsIPv6String(ip) { | ||
allRoutes.Insert(bgp.Route{Prefix: ip + ipv6Suffix}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe wrap this as 'UpdateAllRoutes' and reused in the following function.
if node.GetName() != c.nodeName { | ||
return | ||
} | ||
if reflect.DeepEqual(node.GetLabels(), oldNode.GetLabels()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check labels and annotations only? it reminds me that the case for Flexible IPAM, Do we plan to support this? I suppose it's a valid case for BGP?
@tnqn @antoninbas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong but my understanding is Flexible IPAM is another way to address the problem BGP tries to resolve, by dividing the Node network into multiple flat VLAN subnets and routing between them via a router, so BGP is not required here.
It needs to check labels and annotations because they affect whether the policy applies to the node and the node router ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about it too much but it's an interesting question. I agree with Quan's observation for the current version of Antrea Flexible IPAM.
However, if we interpret "Flexible IPAM" as the ability to assign arbitrary IPs to Pods, without looking at the current implementation (which requires bridging mode), then maybe this is something that BGP can actually help achieve (when supported by the underlay) without requiring that the Node's transport interface be moved to the OVS bridge? What do you think?
Note: that would be a L3 approach to Flexible IPAM, not a L2 approach using VLANs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-opened this conversation not because it impacts this PR, but because I think it's a good thing to discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we should decouple IPAM and implementation. I agree BGPPolicy can help achieve Flexible IPAM without briding mode. Perhaps we can track it via an issue. If my understanding is correct, it may require the BGP processes running on Nodes exchange routes, not only advertise routes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, it may require the BGP processes running on Nodes exchange routes, not only advertise routes?
I would say probably, yes. And that's what the Calico BGP implementation does I think.
I don't know if it is strictly required. If we keep the current approach (Node IPAM + Flexible IPAM, with the flexibility to choose between the 2 for different workloads), then I can see how just disabling SNAT could be enough, assuming that for Pods using Flexible IPs the default route is used and traffic gets to a router to which Flexible IPs were advertised. But for more generic support, and for the ability to send traffic to the destination Node directly (e.g. when 2 Nodes are on the same subnet) I would say being able to learn routes over BGP is necessary.
/test-all |
840d1ed
to
fbb40aa
Compare
queue workqueue.RateLimitingInterface | ||
} | ||
|
||
func NewBGPPolicyController(ctx context.Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit strange to pass the context in the New
function. Would it make more sense to set this field in the Run
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Additionally, I have deleted ctx context.Context
from struct Controller
, and the ctx will be passed through function parameter to methods of goBGP interface, not a member of Controller
.
bgpServer := c.bgpPolicyState.bgpServer | ||
for key := range peerToAddKeys { | ||
peerConfig := curPeerConfigs[key] | ||
if err := bgpServer.AddPeer(c.ctx, peerConfig); err != nil { | ||
return err | ||
} | ||
c.bgpPolicyState.peerConfigs[key] = peerConfig | ||
} | ||
for key := range peerToUpdateKeys { | ||
peerConfig := curPeerConfigs[key] | ||
if err := bgpServer.UpdatePeer(c.ctx, peerConfig); err != nil { | ||
return err | ||
} | ||
c.bgpPolicyState.peerConfigs[key] = peerConfig | ||
} | ||
for key := range peerToDeleteKeys { | ||
peerConfig := prePeerConfigs[key] | ||
if err := bgpServer.RemovePeer(c.ctx, peerConfig); err != nil { | ||
return err | ||
} | ||
delete(c.bgpPolicyState.peerConfigs, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using c.ctx
directly for the add / update / remove peer operations, I think it may make sense to create a child context with a reasonable timeout using https://pkg.go.dev/context#WithTimeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm agreed that it will be better to have a reasonable timeout for calling add/update/remove peer operations, but I'm not sure about how to decide timeout.
fbb40aa
to
c52b745
Compare
} | ||
} | ||
|
||
func (c *Controller) Run(ctx context.Context, stopCh <-chan struct{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's an anti-pattern to pass both a context and a stop channel to the same function. They also "mean" the same thing as the context is generated from the stop channel in this case, and will be cancelled when the channel is closed.
I would recommend only keeping the context in this case, and you can call ctx.Done()
when you need a channel.
You can also do the opposite (keep the channel) and use wait.ContextForChannel
to get a context. But I would say keeping the context is probably a more "modern" / idiomatic approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the detailed explanation!
I would recommend only keeping the context in this case, and you can call ctx.Done() when you need a channel.
I prefer this. It looks neater and more "modern".
slices.Equal(oldSvc.Spec.ExternalIPs, svc.Spec.ExternalIPs) && | ||
slices.Equal(getIngressIPs(oldSvc), getIngressIPs(svc)) && | ||
oldSvc.Spec.ExternalTrafficPolicy == svc.Spec.ExternalTrafficPolicy && | ||
reflect.DeepEqual(oldSvc.Spec.InternalTrafficPolicy, svc.Spec.InternalTrafficPolicy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It feels like a bit of a waste to use reflection to compare 2 *type
values, when type
is a primitive type. You could use https://pkg.go.dev/k8s.io/utils/ptr#Equal
I am not sure if this comment applies to other locations as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here ptr.Equal(oldSvc.Spec.InternalTrafficPolicy, svc.Spec.InternalTrafficPolicy)
looks much better.
For other locations using reflect
, there are pointers within the struct to compare, so I didn't change them.
if node.GetName() != c.nodeName { | ||
return | ||
} | ||
if reflect.DeepEqual(node.GetLabels(), oldNode.GetLabels()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about it too much but it's an interesting question. I agree with Quan's observation for the current version of Antrea Flexible IPAM.
However, if we interpret "Flexible IPAM" as the ability to assign arbitrary IPs to Pods, without looking at the current implementation (which requires bridging mode), then maybe this is something that BGP can actually help achieve (when supported by the underlay) without requiring that the Node's transport interface be moved to the OVS bridge? What do you think?
Note: that would be a L3 approach to Flexible IPAM, not a L2 approach using VLANs
6ba5320
to
9692014
Compare
@hongliangl Thanks a lot for the quick response, @antoninbas @tnqn could you take another look? We plan to release 2.1 tomorrow, need to merge this as early as we can to move forward. Thanks! |
c4c6fcb
to
d9cf1ee
Compare
go wait.UntilWithContext(ctx, c.worker, time.Second) | ||
|
||
go c.secretInformer.Run(ctx.Done()) | ||
cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be waited before starting the worker, along with other informers? otherwise the worker would try to connect peer routers without providing any passwords first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Moved these two lines before go wait.UntilWithContext(ctx, c.worker, time.Second)
d9cf1ee
to
787af7c
Compare
|
||
cacheSyncs := []cache.InformerSynced{ | ||
c.nodeListerSynced, | ||
c.serviceListerSynced, | ||
c.bgpPolicyListerSynced, | ||
c.endpointSliceListerSynced, | ||
c.serviceListerSynced, | ||
} | ||
if c.egressEnabled { | ||
cacheSyncs = append(cacheSyncs, c.egressListerSynced) | ||
} | ||
if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) { | ||
return | ||
} | ||
|
||
go c.secretInformer.Run(ctx.Done()) | ||
cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cacheSyncs := []cache.InformerSynced{ | |
c.nodeListerSynced, | |
c.serviceListerSynced, | |
c.bgpPolicyListerSynced, | |
c.endpointSliceListerSynced, | |
c.serviceListerSynced, | |
} | |
if c.egressEnabled { | |
cacheSyncs = append(cacheSyncs, c.egressListerSynced) | |
} | |
if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) { | |
return | |
} | |
go c.secretInformer.Run(ctx.Done()) | |
cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced) | |
go c.secretInformer.Run(ctx.Done()) | |
cacheSyncs := []cache.InformerSynced{ | |
c.nodeListerSynced, | |
c.serviceListerSynced, | |
c.bgpPolicyListerSynced, | |
c.endpointSliceListerSynced, | |
c.serviceListerSynced, | |
c.secretInformer.HasSynced, | |
} | |
if c.egressEnabled { | |
cacheSyncs = append(cacheSyncs, c.egressListerSynced) | |
} | |
if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) { | |
return | |
} |
This commit implements the controller of API `BGPPolicy`, designed to advertise Service IPs, Egress IPs, and Pod IPs to BGP peers from selected Kubernetes Nodes. According to the spec of `BGPPolicy`, the Node selector is used to select Nodes to which a `BGPPolicy` is applied. Multiple `BGPPolicies` can be applied to the same Node. However, only the oldest `BGPPolicy` will be effective on a Node, with others serving as alternatives. The effective one may be changed in the following cases: - The current effective BGPPolicy is updated and not applied to the Node. - The current effective BGPPolicy is deleted. The BGP server instance is only created and started for the effective BGPPolicy on a Node. If the effective BGPPolicy is changed, the corresponding BGP server instance will be terminated by calling the `Stop` method, and a new BGP server instance will be created and started by calling the `Start` method for the new effective BGPPolicy. To create a BGP server instance, ASN, router ID, and listen port must be specified. The ASN and listen port are specified in the spec of the effective BGPPolicy. For router ID, if the Kubernetes cluster is IPv4-only or dual-stack, we use the Node's IPv4 address as the router ID, ensuring uniqueness. If the Kubernetes cluster is IPv6-only, where no Node IPv4 address is available, the router ID could be specified via the Node annotation `node.antrea.io/bgp-router-id`. If not present, a router ID will be generated by hashing the Node name and update it to the Node annotation `node.antrea.io/bgp-router-id`. Additionally, the stale BGP server instance will be terminated and a new BGP server instance should be created and started when any of ASN, routerID, or listen port changes. The information of the BGP peers is specified in the effective BGPPolicy. The unique identification of a BGP peer is the peer IP address and peer ASN. To reconcile the latest BGP peers: - Get the BGP peers to be added and add them by calling the `AddPeer` method of the BGP server instance. - Get the BGP peers to be deleted and delete them by calling the `RemovePeer` method of the BGP server instance. - Get the remaining BGP peers and calculate the updated BGP peers, then update them by calling the `UpdatePeer` method of the BGP server instance. The information of the IPs to be advertised can be calculated from the spec of the effective BGPPolicy. Currently, we advertise the IPs and CIDRs to all the BGP peers. To reconcile the latest IPs to all BGP peers: - If the BGP server instance is newly created and started, advertise all the IPs by calling the `AdvertiseRoutes` method. - If the BGP server instance is not newly created and started: - Get the IPs/CIDRs to be added and advertise them by calling the `AdvertiseRoutes` method. - Get the IPs/CIDRs to be removed and withdraw them by calling the `WithdrawRoutes` method. The feature is gated by the alpha `BGPPolicy` feature gate and only supported in Linux. Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
787af7c
to
3d33c51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the great contribution!
return "" | ||
} | ||
|
||
func waitAndGetDummyEvent(t *testing.T, c *fakeController) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for a future PR, it may be good to rename this to waitEvent
and doneDummyEvent
could be renamed to removeEvent
. They could also be defined as methods on fakeController
.
/test-all |
This commit implements the controller of API
BGPPolicy
, designed to advertiseService IPs, Egress IPs, and Pod IPs to BGP peers from selected Kubernetes Nodes.
According to the spec of
BGPPolicy
, the Node selector is used to select Nodesto which a
BGPPolicy
is applied. MultipleBGPPolicies
can be applied to thesame Node. However, only the oldest
BGPPolicy
will be effective on a Node,with others serving as alternatives. The effective one may be changed in the
following cases:
The BGP server instance is only created and started for the effective BGPPolicy on
a Node. If the effective BGPPolicy is changed, the corresponding BGP server instance
will be terminated by calling the
Stop
method, and a new BGP server instance willbe created and started by calling the
Start
method for the new effective BGPPolicy.To create a BGP server instance, ASN, router ID, and listen port must be specified.
The ASN and listen port are specified in the spec of the effective BGPPolicy. For router ID,
if the Kubernetes cluster is IPv4-only or dual-stack, we use the Node's IPv4 address
as the router ID, ensuring uniqueness. If the Kubernetes cluster is IPv6-only, where no
Node IPv4 address is available, the router ID could be specified via the Node annotation
node.antrea.io/bgp-router-id
. If not present, a router ID will be generated by hashingthe Node name and update it to the Node annotation
node.antrea.io/bgp-router-id
.Additionally, the stale BGP server instance will be terminated and a new BGP server
instance should be created and started when any of ASN, routerID, or listen port changes.
The information of the BGP peers is specified in the effective BGPPolicy. The unique
identification of a BGP peer is the peer IP address and peer ASN.
To reconcile the latest BGP peers:
AddPeer
method of theBGP server instance.
RemovePeer
methodof the BGP server instance.
calling the
UpdatePeer
method of the BGP server instance.The information of the IPs to be advertised can be calculated from the spec of the
effective BGPPolicy. Currently, we advertise the IPs and CIDRs to all the BGP peers.
To reconcile the latest IPs to all BGP peers:
calling the
AdvertiseRoutes
method.AdvertiseRoutes
method.WithdrawRoutes
method.The feature is gated by the alpha
BGPPolicy
feature gate and only supported in Linux.