Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to change IP address when live migrating #211

Open
stevepp opened this issue Sep 1, 2016 · 13 comments
Open

Add ability to change IP address when live migrating #211

stevepp opened this issue Sep 1, 2016 · 13 comments
Labels
new feature no-auto-close Don't auto-close as a stale issue stale-issue

Comments

@stevepp
Copy link

stevepp commented Sep 1, 2016

Assuming a client-server communication through an established tcp connection, in the case that the server needs to be rebooted for update, is it possible to migrate the associated server process to another backup server with different IP and MAC address, but at the same time transparently migrate the original tcp connection to this backup server?

I guess this would break the tcp connection as the server end would change. I am just wondering whether I could extend criu in some way to support this functionality? Please direct me to the codes which I will potentially need to modify. Thanks.

@xemul
Copy link
Member

xemul commented Sep 2, 2016

Well, changing the IP address is not possible not due to CRIU constraints, but due to how TCP protocol works. One cannot proceed the packet flow with one IP address changed, the client would just ignore such packets.

So when talking about migrating a server to some other place with some other IP three things are to be considered.

  1. Listening sockets. If your server is bound to 0.0.0.0 (INADDR_ANY) then migration would "just work" there's no IP address that would mismatch. If your server is bound to some device, then you'll have to change the binding IP address. Right now this can be done by editing the images, in particular, all PF_INET sockets sit in inetsk.img image and CRIT tool (https://criu.org/CRIT) can be used to modify one.
  2. In-flight connections. These are connect()-ed, but not yet accept()-ed. We have an option --skip-in-flight that makes criu ignore these guys.
  3. Established sockets. These guys are tough, as they do have some real IP address wired into their configuration. Technically it's possible to restore the socket with different IP address (by modifying the inetsk.img with CRIT), but as I said -- the peer would not accept that. In the worst case the connection would get stuck till TCP timeout.

So if we're OK with just breaking these connections we need to teach criu to break them. There are two things to consider while doing this.

a) Dumping sockets. Since we don't really need the connection we'd need to teach criu to skip those guys. The code dumping PF_INET sockets is in criu/sk-inet.c, the code dumping IPPROTO_TCP stuff is in criu/sk-tcp.c

b) Restoring sockets. Just leaving the hole in the place where the connected socket was is not nice, the server would get wrong error codes from syscalls and, which is worse, the hole might become busy with some other file (when server does open/socket/accept/whatever) which will break server internal logic. So at restore time we'd need to put some stub into the descriptor. I would suggest addressing this dump-time and instead of dumping the established socket into image dump the socket that looks like closed one. In this case socket restoring code would just restore the closed socket into proper place.

Hope that helps.

@xemul xemul changed the title Is it possible to use CRIU to transparently migrate a tcp connection to a host with different ip? Add ability to change IP address when live migrating Sep 2, 2016
@xemul
Copy link
Member

xemul commented Sep 2, 2016

@ashtonwebster
Copy link

I know this is pretty old, but this is similar to an issue I am having now. My goal is to restore an echo server example code on a new machine with a new IP address. I am using CRIT to change the inetsk.img file to use a new IP address and then restoring from this image. I have something similar to NAT set up that will rewrite the IP addresses so that after the restore it will appear to the client that the packets are from the original destination. However, even after modifying this inetsk.img file, I still get an error when restoring that suggests I am still trying to use the old IP address. I have verified that if I change the IP address of the machine manually, the restore succeeds, but this is not a feasible solution in the long term.

So I have two questions:

  • In order to change IP addresses for live connections, is it correct that I should only have to modify inetsk.img? Is there any other state regarding the sockets I would have to modify?
  • Is there any progress on this issue?

@ashtonwebster
Copy link

Follow up: for my first question I discovered I also need to change the IP address in files.img using CRIT, but an update on the status of this would still be helpful.

@abh1kg
Copy link

abh1kg commented Apr 18, 2018

I would be also interested in knowing this. I work on a platform as a service offering called Cloud Foundry where the concept of checkpoint and restore is still not integrated though application instances run as runc-compliant Garden containers. I would like to migrate existing TCP connections for an "old" application process to a "new" application container which might be scheduled on a different host altogether.

@amazingvoice
Copy link

@xemul Hi! I'm recently doing some research about live-migrating established TCP connections and find this is exactly the situation I'm confronted with: "changing the IP address is not possible not due to CRIU constraints, but due to how TCP protocol works". Could you help explain in detail what properties of TCP protocol prevent changing the IP address during migration? Thanks!

@dav-ell
Copy link

dav-ell commented Aug 3, 2018

@amazingvoice I think what he's referring to is the fact that the TCP protocol is based on unchanging endpoints. Every connection can be described by a source address, source port, destination address, and destination port. Using these parameters, routers and hosts can make sense and keep track of the packets passing back and forth between the source and destination. Because of this, if you change one end of the connection as you're migrating, then the other side will not be able to match the packets from the new address with the details of the connection that was already established. For packets sent from the migrated container to the non-migrated destination, the destination would be confused and not know what to do. For packets sent from the non-migrated destination, they would be sent to the old host that the migrated container was on, so the migrated container would never even receive them.

As an example, suppose we have a connection between these two endpoints

192.168.1.100 <-> 192.168.1.102

We want to migrate a container on 192.168.1.100 to a new host at 192.168.1.144. So we migrate the container, along with the TCP connection, edit the connection to 192.168.1.144 like we want, and then restore. To the migrated container, the connection now looks like

192.168.1.144 <-> 192.168.1.102

However, to the other side of the connection, it still looks like

192.168.1.100 <-> 192.168.1.102

This mismatch between views of the connection makes the connection fail when restored.

What would be nice is if TCP automatically adjusted connections based on a packet's source address. This would allow us to migrate containers and edit their connections like we did above, and when the container sent a packet to the destination, the destination would receive it and update its existing connection so that both sides were in sync. Unfortunately, to do something like this would cause serious security problems, of which IP address spoofing is one.

One option to deal with this situation is to find a way to update the other side of the connection as well. There's actually a bit of research done on this topic. But every solution, as far as I can tell, is out of CRIU's scope, which can only operate on one side of a connection.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

github-actions bot commented Apr 5, 2021

A friendly reminder that this issue had no activity for 30 days.

@adiantek
Copy link

adiantek commented Oct 7, 2021

criu could send a magic packet to the remote host with the "update IP & port" command. For example:

  • we have three hosts: a, b, c
  • there are connections a <---> c
  • we're migrating a -> b, so:
    • we're informing b (from host a) that there will be new connections to avoid RST in the future and to hold a queue with data,
    • sending information to c that you should change IP and port of the socket,
    • pause process on a,
    • migrate data from a to b,
    • resume process on b
      In some scenarios, we must change the port too, because it can be busy.

@dav-ell
Copy link

dav-ell commented Oct 7, 2021

This sounds great! It's also possible to send a SYN, SYN-ACK, ACK handshake between b and c just prior to restore to handle NATs.

What about security issues with introducing a magic packet (like spoofing, above)?

@adiantek
Copy link

adiantek commented Oct 7, 2021

a: 10.0.0.1
b: 10.0.0.2
c: 10.0.0.3
I'm sending from A to C information "change 10.0.0.1 to 10.0.0.3". C should allow the change only if the source of this packet is A (10.0.0.1). The ability to change IP is only when the user has access to root on all three servers. TCP doesn't have a built-in packet type redirect/location like HTTP, so we must implement it by self.

Another option could be tool sth like "criu change-ip changes.json" and the file will contain all connections to change.
PS. I forgot about "restore to handle NATs" - in OVH you can't send just TCP packet data to other your server in OVH (even if it's the same rack) - you must first send SYN, because the switch will drop the packet.

There are a lot of problems. I think the easiest would be to create "criu-server" and add all nodes to the network, like in Proxmox. Then, we could create the command "criu live-migrate -t PID node123" and this command should update IP on all nodes.

@avagin avagin added no-auto-close Don't auto-close as a stale issue and removed stale-issue labels Oct 7, 2021
@github-actions
Copy link

github-actions bot commented Nov 7, 2021

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature no-auto-close Don't auto-close as a stale issue stale-issue
Projects
None yet
Development

No branches or pull requests

8 participants