Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Live migration for bridged pod network #182

Closed
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions design-proposals/live-migration-for-bridged-pod-network.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Overview
This feature enables support of live-migrate for VMs with pod network in bridge and macvtap modes.

## Motivation
Masquerade and slirp are not so performant compared to bridge and macvtap modes.
We want to use the less latency modes with the opportunity to live-migrate the virtual machines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested to understand the motivation to have a disruptive migration.
If such a migration can cause up to a minute of disruption, aren't there other means to recover a VM on a different node? I was under the impression that live-migration with high disruption of connectivity can cause more harm then good (applications will still run while disruption occurs, vs applications with the whole VM freeze and get restored).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If such a migration can cause up to a minute of disruption, aren't there other means to recover a VM on a different node?

From my tests this is happening within a few seconds not minutes, so even tcp sessions do not have time to break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do tcp connections survive an ip address change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some CNIs support specifying IP-address per pod. So in this way IP-address is not changed, but still requires to renew DHCP lease for updating routes inside the VM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They won't survive. It will not survive even a link flickering... those TCP connections will just reset.
But this is the case with masquerade as well, so this is not something new.
I was more worried about the duration of the downtime (e.g. in the case of attach/detach).

Copy link
Member Author

@kvaps kvaps Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I just performed a test, I used the flowing commands:

server vm:

mkdir cgi-bin
printf '%s\n' '#!/bin/sh' 'cat /dev/urandom' > cgi-bin/1.sh
chmod +x cgi-bin/1.sh
python -m http.server --cgi

client vm:

curl 10.10.10.1:8000/cgi-bin/1.sh > /dev/null

Now I can assure that the tcp-connections are surviving in case if IP address is not changed.

Also I checked flood ping in both cases. When MAC-address is changes and when is not, result:

When MAC-address is the same (link down and link up interface):

47 packets lost
[fedora@vm1 ~]$ sudo ping -f 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
...............................................
--- 10.10.10.2 ping statistics ---
72544 packets transmitted, 72497 received, 0.0647883% packet loss, time 14943ms
rtt min/avg/max/mdev = 0.069/0.117/8.138/0.160 ms, ipg/ewma 0.205/0.106 ms

When MAC-address is changed (reattach network card):

54 packets lost
[fedora@vm1 ~]$ sudo ping -f 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
......................................................
--- 10.10.10.2 ping statistics ---
102443 packets transmitted, 102389 received, 0.0527122% packet loss, time 20785ms
rtt min/avg/max/mdev = 0.069/0.116/39.293/0.222 ms, pipe 4, ipg/ewma 0.202/0.355 ms


## Goals
Provide a live-migration feature for VMs running with pod netowork connected in bridged mode and macvtap mode.

## Non Goals
The live-migration of virtual machines between the pods with different MAC address will invoke NIC reattaching procedure. This might affect some applications which are binding to the specific interface inside the VM.

The live-migration of virtual machines between the pods with same MAC address will invoke the procedure to link down and up for the VM to renew DHCP lease with IP address and routes inside the VM. This is less destructive operation, but still may affect some workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we care a lot about this type of workloads. I don't think you can have this as non-goals - you need to account for this in your design imo.

@AlonaKaplan can you chime in ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yeah, the IP is changed so processes inside the guest the are using the IP will be affected. Same as external processes. Documenting it as first stage should be enough.


## Definition of Users
Everyone who use bridge to bind pod network may want to live-migrate created VMs.

## User Stories
* As a user / admin, I want to have an opportunity for live-migration of a VM with bridged pod-network.
Comment on lines +21 to +22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have some user stories that focus on why we'd want to live migrate with bridge mode, given the limitations this functionality imposes on applications.

So for example, what kinds of applications and scenarios tolerate this kind of live migration where the IP and Mac change for a VM during the runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually be more interested in what apps / scenarios do not tolerate this kind of live migration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may affect some applications which are binding to the specific interface inside the VM.

Isn't it enough? Or should I specify cases with exact applications (eg. apache2 server configured to bind on pod IP instead of 0.0.0.0 and so on)?


## Repos
- [KubeVirt](https://github.com/kubevirt/kubevirt)

# Design

To add two additional methods into live-migration procedure:
- If MAC address changed: detach / attach interface after live migration to have correct MAC address set
- If MAC address is not changed: link down / link up interface to force VM request new IP address and routes from DHCP

## API Examples
There are no API changes from the user side.

## Scalability
I don't see any scalability issues.

## Update/Rollback Compatibility
This change adds new logic also for multus connected networks.
Because, multus can also be used to bind standard CNIs (eg. flannel), which does allow to preserve IP over the nodes.
The live-migration of such VMs were not handle network reconfiguration before, now it will be handled by the same procedure.

## Functional Testing Approach
- Create two VMs: client and server in bridge mode
- Wait for lunch
- Get IP address for server VM
- Run ping from client to server
- live-migrate client VM
- Run ping from client to server

# Implementation Phases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the topic of maintaining this special flow.
Unfortunately, our code is not well fitted to introduce a new option and at the same time keep it isolated and centralized. When the additions are scattered across many areas, it makes it harder to maintain it.

Well, it may be well worth investing in maintaining it by the sig-network if it has enough attraction, interest and at the end real usage.
I think it is worth raising these points now, so we could be clear that even if this feature is accepted as an Alpha, its existence for the long run is dependent on adoption and usage. I guess this is similar to the process of Kuberenetes, but we should have one as well, especially for such controversy features.

Implementation is already prepared as single pull request:
https://github.com/kubevirt/kubevirt/pull/7768