You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When restoring a dump in a new mount + pid namspace in an environment where multiple dumps sharing the same network namespace
can happen simultaneously CRIU was observed to create an anonymous unix socket named crtools-fd-3-0.
Note: I already prepared a patch with what I think would fix this, submitting it right after opening this.
The log line that appears in a restore.log when this happens will look something like this
1606:(01.379765) 3: Error (criu/files.c:1695): Can't bind transport socket /crtools-fd-3-0: Address in use
The pattern pattern crtools-fd-%d-%d is defined in files.c.
In this case 3 is always the restore PID, even when multiple restores are running, because the same process is being restored in a pid namespace whith PID 3, however I was surprised to see that criu_run_id was always 0.
At that point, we investigated a bit and noticed that the last element of that print is the id of the pid namespace, which is obviously not zero in our case so something was off.
It looks like that bit is set in util_init, however, in crtools.cutil_init is called after cr_service_work is started so srwk will never have that bit set.
Steps to reproduce the issue:
Start a new process + mount namespace with a process to dump inside of it
unshare --pid --mount --fork /bin/bash
top # <--- this is the process we want to dump
Now, if the timing is correct and you are running both of those at the same moment what will happen is that you'll see an error in your log mentioning that the socket address is already bound.
Describe the results you received:
The restore fails with this error
1606:(01.379765) 3: Error (criu/files.c:1695): Can't bind transport socket /crtools-fd-3-0: Address in use
Describe the results you expected:
The restore is successful and the crtools-fd path is something like /crtools-fd-3-f000058200000002
Additional information you deem important (e.g. issue happens only occasionally):
CRIU logs and information:
CRIU full dump/restore logs:
(01.452241) 3: Restore on-core sigactions for 3
(01.452300) 3: Error (criu/files.c:1695): Can't bind transport socket /crtools-fd-3-0: Address in use
(01.452376) Error (criu/cr-restore.c:2313): Restoring FAILED.
Output of `criu --version`:
I'm using the current criu-dev, which i would've expected to say 4.0 but it still is on 3.18. Whatever, this is the output
Version: 3.18
GitID: v3.18-320-gdfb56eed6
Output of `criu check --all`:
Looks good.
Additional environment details:
Please let me know if you need more details or something isn't clear and thanks for the hard work y'all put in making this happen!
The text was updated successfully, but these errors were encountered:
When restoring dumps in new mount + pid namespaces where multiple dumps
share the same network namespace, CRIU may fail due to conflicting
unix socket names. This happens because the service worker creates
sockets using a pattern that includes criu_run_id, but util_init()
is called after cr_service_work() starts.
The socket naming pattern "crtools-fd-%d-%d" uses the restore PID
and criu_run_id, however criu_run_id is always 0 when not initialized,
leading to conflicts when multiple restores run simultaneously either
in the same CRIU process or because of multiple CRIU processes
doing the same operation in different PID namespaces.
Fix this by:
- Moving util_init() before cr_service_work() starts
- Adding a second util_init() call in the service worker fork
to ensure unique IDs across multiple worker runs
- Making sure that dump and restore operations have util_init() called
early to generate unique socket names
With this fix, socket names always include the namespace ID, preventing
conflicts when multiple processes with the same pid share a network
namespace.
Fixescheckpoint-restore#2499
Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
Description
Hey team!
When restoring a dump in a new mount + pid namspace in an environment where multiple dumps sharing the same network namespace
can happen simultaneously CRIU was observed to create an anonymous unix socket named
crtools-fd-3-0
.Note: I already prepared a patch with what I think would fix this, submitting it right after opening this.
The log line that appears in a
restore.log
when this happens will look something like thisThe pattern pattern
crtools-fd-%d-%d
is defined in files.c.In this case
3
is always the restore PID, even when multiple restores are running, because the same process is being restored in a pid namespace whith PID 3, however I was surprised to see thatcriu_run_id
was always0
.At that point, we investigated a bit and noticed that the last element of that print is the id of the pid namespace, which is obviously not zero in our case so something was off.
It looks like that bit is set in
util_init
, however, incrtools.c
util_init
is called aftercr_service_work
is started so srwk will never have that bit set.Steps to reproduce the issue:
Start a new process + mount namespace with a process to dump inside of it
Now nsenter that namespace and dump top
Now try to restore that process multiple times simultaneously in two different dump folders
For example, in two different terminals, start a new pid namespace again and restore the process.
and
Now, if the timing is correct and you are running both of those at the same moment what will happen is that you'll see an error in your log mentioning that the socket address is already bound.
Describe the results you received:
The restore fails with this error
Describe the results you expected:
The restore is successful and the crtools-fd path is something like
/crtools-fd-3-f000058200000002
Additional information you deem important (e.g. issue happens only occasionally):
CRIU logs and information:
CRIU full dump/restore logs:
Output of `criu --version`:
I'm using the current
criu-dev
, which i would've expected to say 4.0 but it still is on 3.18. Whatever, this is the outputOutput of `criu check --all`:
Additional environment details:
Please let me know if you need more details or something isn't clear and thanks for the hard work y'all put in making this happen!
The text was updated successfully, but these errors were encountered: