Skip to content

Commit

Permalink
Changes to make lighthouse MCS API complaint
Browse files Browse the repository at this point in the history
Currently Lighthouse implementation of MCS API differs on 2 key points:

1. There is no VIP allocated for a MultiCluster Service.
2. EndpointSlices don't carry PodIPs for ClusterIP services.

This Enhancement proposes design changes to support above two in a user
configurable way without impacting existing deployments.

Signed-off-by: Vishal Thapar <5137689+vthapar@users.noreply.github.com>
  • Loading branch information
vthapar committed Aug 27, 2024
1 parent 9878751 commit 4e09002
Showing 1 changed file with 177 additions and 0 deletions.
177 changes: 177 additions & 0 deletions lighthouse/mcs-compliance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Add support for 100% MCS compliance

Related issue:
[Add support for 100% MCS Compliance](https://github.com/submariner-io/enhancements/issues/229)

## Summary

Current Lighthouse implementation differs from MCS API [^1] on two key points for ClusterIP Services:

1. There is no VirtualIP [VIP] assigned in `ServiceImport`.
2. `EndpointSlices` do not contain individual PodIPs.

To better align with MCS API Lighthouse needs an option to allocate and track VirtualIPs in `ServiceImports` and use
PodIPs in `EndpointSlices`.

## Proposal

When creating `ServiceExport`, users will have option to add annotation for enabling VIP for that Service and another
to use PodIPs in `EndpointSlices`. These annotations will also be added to the aggregated `ServiceImport`.

Option to add PodIPs to EndpointSlice will only be applicable when VIP option is also enabled. A use case that requires
PodIPs but not VIP can already be taken care of by using HeadlessServices.

Configuration flags will also be added to `Submariner` and `ServiceDiscovery` CRDs, and `subctl` to set these options at
global level. This is for deployments where user only wants to use one specific option and doesn't want to set the
annotation explicitly for each `ServiceExport`.

Default behavior with nothing configured would be same as it is currently i.e. no VIP and no PodIPs.

Global flag can't be changed at runtime and requires a reinstall of whole clusterset. Change of flag in CRs will require
a reinstall of Submariner on that cluster as it changes he behavior of `ServiceExports` without annotation.

### VIP CIDR and allocation

1. Default CIDR of `243.0.0/8` will be used for Clusterset. User can change this at the time of deploying broker.
2. Each Cluster will get a default subset CIDR of `243.0.0.0/16` for allocating VIPs to Services exported from that
cluster. User can change this at the time of join but this CIDR must be an unallocated subset of global VIP CIDR.
3. VIP allocation is done by the first cluster to export the `Service`.
4. When Service is exported on subsequent clusters, no VIP allocation is done if VIP already present on `ServiceImport`.
5. VIP is deallocated only when `ServiceImport` is deleted i.e. `Service` or `ServiceExport` are no longer present on any
of the clusters.
6. If Submariner is uninstalled on the cluster that allocated a given VIP, VIP is not changed on `ServiceImport`.
7. When `lighthouse-agent` starts, it should check `ServiceImport`s for any VIPs allocated from its VIP CIDR. Any such
VIPs should be added to its local VIP allocation cache. This will also help with scenarios where cluster that allocated
VIP has uninstalled Submariner and another cluster got the same VIP CIDR on installation.

Note that `1` and `2` above are similar to Globalnet CIDR.

### DNS Resolution

Lighthouse DNS will return VIP for `ServiceImport`s if available. It will be up to any other solution external to
Submariner to load balance this VIP to PodIPs in `EndpointSlices`. If VIP is not present, it will behave as it does
today, i.e. return one of the ClusterIPs from `EndpointSlices`.

### Conflict Resolution

In case of conflict in VIP and PodIP settings when exporting `Service`, configuration on the oldest `ServiceExport`
will be used. The `ServiceExport` in conflict will not be exported and `ServiceExportConflict` condition will be set.
The `Valid` condition will also be set to `true` as Service itself is exportable.

### Creating `ServiceExport`

Here are the detailed steps when creating a `ServiceExport`:

1. When a new `ServiceExport` is created, check if `ServiceImport` exists or not.
2. If `ServiceImport` doesn't exist, allocate the VIP and add it to new `ServiceImport`.
3. If `ServiceImport` exists, compare the combination of annotations on `ServiceExport` and global flags against
annotations in `ServiceImport` for any conflicts.
4. In case of conflict, set `ServiceExportConflict` condition on the `ServiceExport` and attempt conflict resolution.
5. In case conflict can't be resolved, don't export the service.
6. If PodIP annotation is set, create `EndpointSlices` with PodIPs instead of ClusterIP. Follow existing rules for
merging port information in `EndpointSlices`.

### Updating annotation on ServiceExport

Changing annotation will not modify the current behavior of using VIP or PodIPs. It will only change the conflict
condition on `ServiceExport` if it adds or resolves the conflict, depending on the change. If the users wish to change
the behavior for a given `Service`, they will need to delete all `ServiceExports` and recreate them with required
annotations.

### Deleting `ServiceExport`

Here are the detailed steps when deleting a `ServiceExport`:

1. Check the cluster list on `ServiceImport`. If this is the last cluster to Export the service. If not, proceed as
usual irrespective of VIP and PodIP annotations. This means deleting EndpointSlice, recalculating service ports and
updating cluster list on `ServiceImport`.
2. If this is last cluster, deallocate the VIP if it is same CIDR as current cluster and delete the `ServiceImport`.

### Globalnet

Globalnet will not be supported yet. Supporting Globalnet with PodIPs will be a significant scale issue as each Pod
backing the Service will require a GlobalIngressIP.

For any use cases that require Globalnet support with this feature, a separate Enhancement can be proposed in the
future.

### Migration

Since there is no change in default behavior, nothing should be required for migration.

If users want to add annotations on existing ServiceExports post deployment, they will need to delete ServiceExports.

## Design Details

### Global Flags for deploy broker

When deploying broker, `--enable-vip` and `--endpointslices-with-pod-ips` flags will be added. There will also be an
`--clusterset-vip-cidr` option to set the VIP CIDR for clusterset to a non-default option.

### Flags for subctl join

Following flags will be added to `subctl join`:

* `--cluster-vip-cidr=a.b.c.d/x` - set the VIP CIDR for this cluster. Default will be allocated from Clusterset VIP CIDR
* `--endpointslices-with-pod-ips=false/true` - Use PodIPs when creating `EndpointSlices`. Only applicable for ClusterIP
ClusterIP services when cluster-vip-cidr is configured. Default is false.

### New CRD flags

Following CRDs will need to be modified to support these new flags

```Go
type BrokerSpec struct {
ClustersetVIPEnabled bool `json:"clustersetVIPEnabled,omitempty"`
ClustersetVIPCIDRRange []string `json:"clustersetVIPCIDRRange,omitempty"`
`EndpointSlices`WithPodIPs bool `json:"endpointSlicesWithPodIps,omitempty"`
}
```

```Go
type SubmarinerSpec struct {
ClusterVIPCIDR []string `json:"clusterVipCidr,omitempty"`
EndpointSlicesWithPodIPs bool `json:"endpointSlicesWithPodIps"`
}
```

```Go
type SubmarinerDiscoverySpec struct {
ClusterVIPCIDR []string `json:"clusterVipCidr,omitempty"`
EndpointSlicesWithPodIPs bool `json:"endpointSlicesWithPodIps"`
}
```

### Annotations for `ServiceExport`

* `lighthouse.submariner.io/use-vip` - Use VIP for the `ServiceExport`
* `lighthouse.submariner.io/endpointslice-with-pod-ips` - Use PodIPs in `EndpointSlices`

### Pros

1. Doesn't modify existing behaviour
2. Allows users to select behavior on a per Service basis.

### Cons

1. More configuration options for users to consider.
2. More chances of `ServiceExports` on different clusters being in conflict due to wrong annotations etc.

### Backward Compatibility

None.

### Alternatives

1. Only use Global flag for entire deployment. This was discarded in favor of annotations based approach to provide
flexibility to users as they can mix and match services with and without VIP.
2. In case of conflict on VIP/PodIPs, don't export service at all.

## User Impact

Existing users will not be impacted in any ways. Users who wish to use this feature will need to reinstall Submariner
with new flags or use annotations on ServiceExports.

## References

[^1]: [KEP-1645: Multi-Cluster Services API](https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/1645-multi-cluster-services-api#constraints-and-conflict-resolution)

0 comments on commit 4e09002

Please sign in to comment.