From 4e09002c2ee38063b9ecc3b604d4fdfaa543dcad Mon Sep 17 00:00:00 2001 From: Vishal Thapar <5137689+vthapar@users.noreply.github.com> Date: Tue, 27 Aug 2024 09:51:42 +0530 Subject: [PATCH] Changes to make lighthouse MCS API complaint Currently Lighthouse implementation of MCS API differs on 2 key points: 1. There is no VIP allocated for a MultiCluster Service. 2. EndpointSlices don't carry PodIPs for ClusterIP services. This Enhancement proposes design changes to support above two in a user configurable way without impacting existing deployments. Signed-off-by: Vishal Thapar <5137689+vthapar@users.noreply.github.com> --- lighthouse/mcs-compliance.md | 177 +++++++++++++++++++++++++++++++++++ 1 file changed, 177 insertions(+) create mode 100644 lighthouse/mcs-compliance.md diff --git a/lighthouse/mcs-compliance.md b/lighthouse/mcs-compliance.md new file mode 100644 index 00000000..12385134 --- /dev/null +++ b/lighthouse/mcs-compliance.md @@ -0,0 +1,177 @@ +# Add support for 100% MCS compliance + +Related issue: +[Add support for 100% MCS Compliance](https://github.com/submariner-io/enhancements/issues/229) + +## Summary + +Current Lighthouse implementation differs from MCS API [^1] on two key points for ClusterIP Services: + +1. There is no VirtualIP [VIP] assigned in `ServiceImport`. +2. `EndpointSlices` do not contain individual PodIPs. + +To better align with MCS API Lighthouse needs an option to allocate and track VirtualIPs in `ServiceImports` and use +PodIPs in `EndpointSlices`. + +## Proposal + +When creating `ServiceExport`, users will have option to add annotation for enabling VIP for that Service and another +to use PodIPs in `EndpointSlices`. These annotations will also be added to the aggregated `ServiceImport`. + +Option to add PodIPs to EndpointSlice will only be applicable when VIP option is also enabled. A use case that requires +PodIPs but not VIP can already be taken care of by using HeadlessServices. + +Configuration flags will also be added to `Submariner` and `ServiceDiscovery` CRDs, and `subctl` to set these options at +global level. This is for deployments where user only wants to use one specific option and doesn't want to set the +annotation explicitly for each `ServiceExport`. + +Default behavior with nothing configured would be same as it is currently i.e. no VIP and no PodIPs. + +Global flag can't be changed at runtime and requires a reinstall of whole clusterset. Change of flag in CRs will require +a reinstall of Submariner on that cluster as it changes he behavior of `ServiceExports` without annotation. + +### VIP CIDR and allocation + +1. Default CIDR of `243.0.0/8` will be used for Clusterset. User can change this at the time of deploying broker. +2. Each Cluster will get a default subset CIDR of `243.0.0.0/16` for allocating VIPs to Services exported from that +cluster. User can change this at the time of join but this CIDR must be an unallocated subset of global VIP CIDR. +3. VIP allocation is done by the first cluster to export the `Service`. +4. When Service is exported on subsequent clusters, no VIP allocation is done if VIP already present on `ServiceImport`. +5. VIP is deallocated only when `ServiceImport` is deleted i.e. `Service` or `ServiceExport` are no longer present on any +of the clusters. +6. If Submariner is uninstalled on the cluster that allocated a given VIP, VIP is not changed on `ServiceImport`. +7. When `lighthouse-agent` starts, it should check `ServiceImport`s for any VIPs allocated from its VIP CIDR. Any such +VIPs should be added to its local VIP allocation cache. This will also help with scenarios where cluster that allocated +VIP has uninstalled Submariner and another cluster got the same VIP CIDR on installation. + +Note that `1` and `2` above are similar to Globalnet CIDR. + +### DNS Resolution + +Lighthouse DNS will return VIP for `ServiceImport`s if available. It will be up to any other solution external to +Submariner to load balance this VIP to PodIPs in `EndpointSlices`. If VIP is not present, it will behave as it does +today, i.e. return one of the ClusterIPs from `EndpointSlices`. + +### Conflict Resolution + +In case of conflict in VIP and PodIP settings when exporting `Service`, configuration on the oldest `ServiceExport` +will be used. The `ServiceExport` in conflict will not be exported and `ServiceExportConflict` condition will be set. +The `Valid` condition will also be set to `true` as Service itself is exportable. + +### Creating `ServiceExport` + +Here are the detailed steps when creating a `ServiceExport`: + +1. When a new `ServiceExport` is created, check if `ServiceImport` exists or not. +2. If `ServiceImport` doesn't exist, allocate the VIP and add it to new `ServiceImport`. +3. If `ServiceImport` exists, compare the combination of annotations on `ServiceExport` and global flags against +annotations in `ServiceImport` for any conflicts. +4. In case of conflict, set `ServiceExportConflict` condition on the `ServiceExport` and attempt conflict resolution. +5. In case conflict can't be resolved, don't export the service. +6. If PodIP annotation is set, create `EndpointSlices` with PodIPs instead of ClusterIP. Follow existing rules for +merging port information in `EndpointSlices`. + +### Updating annotation on ServiceExport + +Changing annotation will not modify the current behavior of using VIP or PodIPs. It will only change the conflict +condition on `ServiceExport` if it adds or resolves the conflict, depending on the change. If the users wish to change +the behavior for a given `Service`, they will need to delete all `ServiceExports` and recreate them with required +annotations. + +### Deleting `ServiceExport` + +Here are the detailed steps when deleting a `ServiceExport`: + +1. Check the cluster list on `ServiceImport`. If this is the last cluster to Export the service. If not, proceed as +usual irrespective of VIP and PodIP annotations. This means deleting EndpointSlice, recalculating service ports and +updating cluster list on `ServiceImport`. +2. If this is last cluster, deallocate the VIP if it is same CIDR as current cluster and delete the `ServiceImport`. + +### Globalnet + +Globalnet will not be supported yet. Supporting Globalnet with PodIPs will be a significant scale issue as each Pod +backing the Service will require a GlobalIngressIP. + +For any use cases that require Globalnet support with this feature, a separate Enhancement can be proposed in the +future. + +### Migration + +Since there is no change in default behavior, nothing should be required for migration. + +If users want to add annotations on existing ServiceExports post deployment, they will need to delete ServiceExports. + +## Design Details + +### Global Flags for deploy broker + +When deploying broker, `--enable-vip` and `--endpointslices-with-pod-ips` flags will be added. There will also be an +`--clusterset-vip-cidr` option to set the VIP CIDR for clusterset to a non-default option. + +### Flags for subctl join + +Following flags will be added to `subctl join`: + +* `--cluster-vip-cidr=a.b.c.d/x` - set the VIP CIDR for this cluster. Default will be allocated from Clusterset VIP CIDR +* `--endpointslices-with-pod-ips=false/true` - Use PodIPs when creating `EndpointSlices`. Only applicable for ClusterIP +ClusterIP services when cluster-vip-cidr is configured. Default is false. + +### New CRD flags + +Following CRDs will need to be modified to support these new flags + +```Go +type BrokerSpec struct { + ClustersetVIPEnabled bool `json:"clustersetVIPEnabled,omitempty"` + ClustersetVIPCIDRRange []string `json:"clustersetVIPCIDRRange,omitempty"` + `EndpointSlices`WithPodIPs bool `json:"endpointSlicesWithPodIps,omitempty"` +} +``` + +```Go +type SubmarinerSpec struct { + ClusterVIPCIDR []string `json:"clusterVipCidr,omitempty"` + EndpointSlicesWithPodIPs bool `json:"endpointSlicesWithPodIps"` +} +``` + +```Go +type SubmarinerDiscoverySpec struct { + ClusterVIPCIDR []string `json:"clusterVipCidr,omitempty"` + EndpointSlicesWithPodIPs bool `json:"endpointSlicesWithPodIps"` +} +``` + +### Annotations for `ServiceExport` + +* `lighthouse.submariner.io/use-vip` - Use VIP for the `ServiceExport` +* `lighthouse.submariner.io/endpointslice-with-pod-ips` - Use PodIPs in `EndpointSlices` + +### Pros + +1. Doesn't modify existing behaviour +2. Allows users to select behavior on a per Service basis. + +### Cons + +1. More configuration options for users to consider. +2. More chances of `ServiceExports` on different clusters being in conflict due to wrong annotations etc. + +### Backward Compatibility + +None. + +### Alternatives + +1. Only use Global flag for entire deployment. This was discarded in favor of annotations based approach to provide +flexibility to users as they can mix and match services with and without VIP. +2. In case of conflict on VIP/PodIPs, don't export service at all. + +## User Impact + +Existing users will not be impacted in any ways. Users who wish to use this feature will need to reinstall Submariner +with new flags or use annotations on ServiceExports. + +## References + +[^1]: [KEP-1645: Multi-Cluster Services API](https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/1645-multi-cluster-services-api#constraints-and-conflict-resolution)