-
Notifications
You must be signed in to change notification settings - Fork 25
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Changes to make lighthouse MCS API complaint
Currently Lighthouse implementation of MCS API differs on 2 key points: 1. There is no VIP allocated for a MultiCluster Service. 2. EndpointSlices don't carry PodIPs for ClusterIP services. This Enhancement proposes design changes to support above two in a user configurable way without impacting existing deployments. Signed-off-by: Vishal Thapar <5137689+vthapar@users.noreply.github.com>
- Loading branch information
Showing
1 changed file
with
177 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
# Add support for 100% MCS compliance | ||
|
||
Related issue: | ||
[Add support for 100% MCS Compliance](https://github.com/submariner-io/enhancements/issues/229) | ||
|
||
## Summary | ||
|
||
Current Lighthouse implementation differs from MCS API [^1] on two key points for ClusterIP Services: | ||
|
||
1. There is no VirtualIP [VIP] assigned in `ServiceImport`. | ||
2. `EndpointSlices` do not contain individual PodIPs. | ||
|
||
To better align with MCS API Lighthouse needs an option to allocate and track VirtualIPs in `ServiceImports` and use | ||
PodIPs in `EndpointSlices`. | ||
|
||
## Proposal | ||
|
||
When creating `ServiceExport`, users will have option to add annotation for enabling VIP for that Service and another | ||
to use PodIPs in `EndpointSlices`. These annotations will also be added to the aggregated `ServiceImport`. | ||
|
||
Option to add PodIPs to EndpointSlice will only be applicable when VIP option is also enabled. A use case that requires | ||
PodIPs but not VIP can already be taken care of by using HeadlessServices. | ||
|
||
Configuration flags will also be added to `Submariner` and `ServiceDiscovery` CRDs, and `subctl` to set these options at | ||
global level. This is for deployments where user only wants to use one specific option and doesn't want to set the | ||
annotation explicitly for each `ServiceExport`. | ||
|
||
Default behavior with nothing configured would be same as it is currently i.e. no VIP and no PodIPs. | ||
|
||
Global flag can't be changed at runtime and requires a reinstall of whole clusterset. Change of flag in CRs will require | ||
a reinstall of Submariner on that cluster as it changes he behavior of `ServiceExports` without annotation. | ||
|
||
### VIP CIDR and allocation | ||
|
||
1. Default CIDR of `243.0.0/8` will be used for Clusterset. User can change this at the time of deploying broker. | ||
2. Each Cluster will get a default subset CIDR of `243.0.0.0/16` for allocating VIPs to Services exported from that | ||
cluster. User can change this at the time of join but this CIDR must be an unallocated subset of global VIP CIDR. | ||
3. VIP allocation is done by the first cluster to export the `Service`. | ||
4. When Service is exported on subsequent clusters, no VIP allocation is done if VIP already present on `ServiceImport`. | ||
5. VIP is deallocated only when `ServiceImport` is deleted i.e. `Service` or `ServiceExport` are no longer present on any | ||
of the clusters. | ||
6. If Submariner is uninstalled on the cluster that allocated a given VIP, VIP is not changed on `ServiceImport`. | ||
7. When `lighthouse-agent` starts, it should check `ServiceImport`s for any VIPs allocated from its VIP CIDR. Any such | ||
VIPs should be added to its local VIP allocation cache. This will also help with scenarios where cluster that allocated | ||
VIP has uninstalled Submariner and another cluster got the same VIP CIDR on installation. | ||
|
||
Note that `1` and `2` above are similar to Globalnet CIDR. | ||
|
||
### DNS Resolution | ||
|
||
Lighthouse DNS will return VIP for `ServiceImport`s if available. It will be up to any other solution external to | ||
Submariner to load balance this VIP to PodIPs in `EndpointSlices`. If VIP is not present, it will behave as it does | ||
today, i.e. return one of the ClusterIPs from `EndpointSlices`. | ||
|
||
### Conflict Resolution | ||
|
||
In case of conflict in VIP and PodIP settings when exporting `Service`, configuration on the oldest `ServiceExport` | ||
will be used. The `ServiceExport` in conflict will not be exported and `ServiceExportConflict` condition will be set. | ||
The `Valid` condition will also be set to `true` as Service itself is exportable. | ||
|
||
### Creating `ServiceExport` | ||
|
||
Here are the detailed steps when creating a `ServiceExport`: | ||
|
||
1. When a new `ServiceExport` is created, check if `ServiceImport` exists or not. | ||
2. If `ServiceImport` doesn't exist, allocate the VIP and add it to new `ServiceImport`. | ||
3. If `ServiceImport` exists, compare the combination of annotations on `ServiceExport` and global flags against | ||
annotations in `ServiceImport` for any conflicts. | ||
4. In case of conflict, set `ServiceExportConflict` condition on the `ServiceExport` and attempt conflict resolution. | ||
5. In case conflict can't be resolved, don't export the service. | ||
6. If PodIP annotation is set, create `EndpointSlices` with PodIPs instead of ClusterIP. Follow existing rules for | ||
merging port information in `EndpointSlices`. | ||
|
||
### Updating annotation on ServiceExport | ||
|
||
Changing annotation will not modify the current behavior of using VIP or PodIPs. It will only change the conflict | ||
condition on `ServiceExport` if it adds or resolves the conflict, depending on the change. If the users wish to change | ||
the behavior for a given `Service`, they will need to delete all `ServiceExports` and recreate them with required | ||
annotations. | ||
|
||
### Deleting `ServiceExport` | ||
|
||
Here are the detailed steps when deleting a `ServiceExport`: | ||
|
||
1. Check the cluster list on `ServiceImport`. If this is the last cluster to Export the service. If not, proceed as | ||
usual irrespective of VIP and PodIP annotations. This means deleting EndpointSlice, recalculating service ports and | ||
updating cluster list on `ServiceImport`. | ||
2. If this is last cluster, deallocate the VIP if it is same CIDR as current cluster and delete the `ServiceImport`. | ||
|
||
### Globalnet | ||
|
||
Globalnet will not be supported yet. Supporting Globalnet with PodIPs will be a significant scale issue as each Pod | ||
backing the Service will require a GlobalIngressIP. | ||
|
||
For any use cases that require Globalnet support with this feature, a separate Enhancement can be proposed in the | ||
future. | ||
|
||
### Migration | ||
|
||
Since there is no change in default behavior, nothing should be required for migration. | ||
|
||
If users want to add annotations on existing ServiceExports post deployment, they will need to delete ServiceExports. | ||
|
||
## Design Details | ||
|
||
### Global Flags for deploy broker | ||
|
||
When deploying broker, `--enable-vip` and `--endpointslices-with-pod-ips` flags will be added. There will also be an | ||
`--clusterset-vip-cidr` option to set the VIP CIDR for clusterset to a non-default option. | ||
|
||
### Flags for subctl join | ||
|
||
Following flags will be added to `subctl join`: | ||
|
||
* `--cluster-vip-cidr=a.b.c.d/x` - set the VIP CIDR for this cluster. Default will be allocated from Clusterset VIP CIDR | ||
* `--endpointslices-with-pod-ips=false/true` - Use PodIPs when creating `EndpointSlices`. Only applicable for ClusterIP | ||
ClusterIP services when cluster-vip-cidr is configured. Default is false. | ||
|
||
### New CRD flags | ||
|
||
Following CRDs will need to be modified to support these new flags | ||
|
||
```Go | ||
type BrokerSpec struct { | ||
ClustersetVIPEnabled bool `json:"clustersetVIPEnabled,omitempty"` | ||
ClustersetVIPCIDRRange []string `json:"clustersetVIPCIDRRange,omitempty"` | ||
`EndpointSlices`WithPodIPs bool `json:"endpointSlicesWithPodIps,omitempty"` | ||
} | ||
``` | ||
|
||
```Go | ||
type SubmarinerSpec struct { | ||
ClusterVIPCIDR []string `json:"clusterVipCidr,omitempty"` | ||
EndpointSlicesWithPodIPs bool `json:"endpointSlicesWithPodIps"` | ||
} | ||
``` | ||
|
||
```Go | ||
type SubmarinerDiscoverySpec struct { | ||
ClusterVIPCIDR []string `json:"clusterVipCidr,omitempty"` | ||
EndpointSlicesWithPodIPs bool `json:"endpointSlicesWithPodIps"` | ||
} | ||
``` | ||
|
||
### Annotations for `ServiceExport` | ||
|
||
* `lighthouse.submariner.io/use-vip` - Use VIP for the `ServiceExport` | ||
* `lighthouse.submariner.io/endpointslice-with-pod-ips` - Use PodIPs in `EndpointSlices` | ||
|
||
### Pros | ||
|
||
1. Doesn't modify existing behaviour | ||
2. Allows users to select behavior on a per Service basis. | ||
|
||
### Cons | ||
|
||
1. More configuration options for users to consider. | ||
2. More chances of `ServiceExports` on different clusters being in conflict due to wrong annotations etc. | ||
|
||
### Backward Compatibility | ||
|
||
None. | ||
|
||
### Alternatives | ||
|
||
1. Only use Global flag for entire deployment. This was discarded in favor of annotations based approach to provide | ||
flexibility to users as they can mix and match services with and without VIP. | ||
2. In case of conflict on VIP/PodIPs, don't export service at all. | ||
|
||
## User Impact | ||
|
||
Existing users will not be impacted in any ways. Users who wish to use this feature will need to reinstall Submariner | ||
with new flags or use annotations on ServiceExports. | ||
|
||
## References | ||
|
||
[^1]: [KEP-1645: Multi-Cluster Services API](https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/1645-multi-cluster-services-api#constraints-and-conflict-resolution) |