Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Criu check fails on NI Linux RT #1864

Open
ImreSzebelledi opened this issue May 3, 2022 · 7 comments
Open

Criu check fails on NI Linux RT #1864

ImreSzebelledi opened this issue May 3, 2022 · 7 comments

Comments

@ImreSzebelledi
Copy link

Description

Hello!
I am trying to get criu work on a linux distribution released by National instrument on an industrial controller (NI-3172) but I am experiencing difficulties. I have built the Kernel with the needed flags, but there seems to be some problem when running criu check when running criu even with root privileges.
Thank you for the help in advance!

Steps to reproduce the issue:

  1. Build https://github.com/ni/linux.git -- branch: nilrt/21.8/5.15
  2. Install criu with dependencies
  3. Run criu check

Output of `criu --version`:

Version: 3.12

Output of `criu check --all`:

Error (criu/libnetlink.c:55): -95 reported by netlink: Operation not supported
Error (criu/net.c:3225): Unable to create a veth pair: -95
Warn  (criu/net.c:3247): NSID isn't reported for network links
Error (criu/arch/x86/kerndat.c:189): Continue after SIGSTOP.. Urr what?

[4]+  Stopped                 criu check --all

Additional environment details:

Kernel version:
Linux NI-IC-3172-01E67A74 5.15.26-rt34-g2360492e22b4 #1 SMP PREEMPT_RT Tue May 3 11:22:29 CEST 2022 x86_64 GNU/Linux

@adrianreber
Copy link
Member

This is indeed an unusual error. Can you try CRIU 3.16? Not sure that helps, but maybe.

Not sure if CRIU works on the RT kernel. Can you run criu check --all -v4? Can you also try without RT?

@mihalicyn
Copy link
Member

He-he. Very interesting... failure similar to static/bridge fail from #1862

@mihalicyn
Copy link
Member

@ImreSzebelledi can you post your kernel build config and lsmod output?

@ImreSzebelledi
Copy link
Author

ImreSzebelledi commented May 5, 2022

Thank you very much for the suggestions!

Unfortunately I am having trouble installing version 3.16 because of various reasons (don't have proper package manager on this distro (only opkg), so I got 3.12 working by copying the proper prebuilt files from Centos7. I have tried to do the same from Ubuntu22 for 3.16 but I got various errors bacause it got dependecies for Glibc which are not available for the NI linux RT. At this point I gave up copying and instead started trying to build 3.16 from source on the NI linux RT but I still having difficulties building protobuf beforehand on it... sigh. Despite all these things I have attached the kernel config and the lsmod output
kernelconfig.txt
.

lsmod:

admin@NI-IC-3172-01E67A74:~# lsmod

Module Size Used by
tmp421 16384 0
g_ether 16384 0
u_ether 24576 1 g_ether
libcomposite 61440 1 g_ether
udc_core 61440 2 u_ether,libcomposite
hid_logitech_hidpp 40960 0
mousedev 20480 0
ipv6 454656 25
hid_logitech_dj 28672 0
x86_pkg_temp_thermal 16384 0
coretemp 16384 0
aesni_intel 376832 0
i2c_i801 28672 0
crypto_simd 16384 1 aesni_intel
i915 1978368 5
i2c_smbus 16384 1 i2c_i801
intel_gtt 20480 1 i915
lpc_ich 28672 0
mfd_core 16384 1 lpc_ich
drm_kms_helper 253952 1 i915
syscopyarea 16384 1 drm_kms_helper
sysfillrect 16384 1 drm_kms_helper
sysimgblt 16384 1 drm_kms_helper
fb_sys_fops 16384 1 drm_kms_helper
ttm 65536 1 i915
agpgart 36864 1 ttm
video 49152 1 i915
drm 471040 7 drm_kms_helper,i915,ttm
backlight 20480 4 video,drm_kms_helper,i915,drm
button 16384 0
igb 176128 0
e1000e 184320 0
i2c_algo_bit 16384 2 igb,i915
admin@NI-IC-3172-01E67A74:~#

@mihalicyn
Copy link
Member

mihalicyn commented May 5, 2022

# CONFIG_VETH is not set

This is direct reason for:

Error (criu/libnetlink.c:55): -95 reported by netlink: Operation not supported
Error (criu/net.c:3225): Unable to create a veth pair: -95
Warn  (criu/net.c:3247): NSID isn't reported for network links

VETH support is not required but I recommend to compile it as a module. It's a small, well tested and fully safe module.

Error (criu/arch/x86/kerndat.c:189): Continue after SIGSTOP.. Urr what?

this is really strange:

static int kdat_x86_has_ptrace_fpu_xsave_bug_child(void *arg)
{
	if (ptrace(PTRACE_TRACEME, 0, 0, 0)) {
		pr_perror("%d: ptrace(PTRACE_TRACEME) failed", getpid());
		_exit(1);
	}

	if (kill(getpid(), SIGSTOP))
		pr_perror("%d: failed to kill myself", getpid());

	pr_err("Continue after SIGSTOP.. Urr what?\n");
	_exit(1);
}

Looks like a real kernel issue (likely related to CONFIG_PREEMPT_RT=y).
Speaking honestly, this particular error don't prevents CRIU from work.
It's better to try to create minimal reproducer for that and report an issue to https://github.com/ni/linux.git kernel maintainers.

Possible minimal reproducer:

#define _GNU_SOURCE
#include <linux/sched.h>    /* Definition of struct clone_args */
#include <sched.h>          /* Definition of CLONE_* constants */
#include <sys/syscall.h>    /* Definition of SYS_* constants */
#include <sched.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <stdio.h>
#include <error.h>

#define ARRAY_SIZE(x)		(sizeof(x) / sizeof((x)[0]))
#define PAGE_SIZE 4096

static int bug_child(void *arg)
{
	if (ptrace(PTRACE_TRACEME, 0, 0, 0)) {
		printf("%d: ptrace(PTRACE_TRACEME) failed\n", getpid());
		_exit(1);
	}

	if (kill(getpid(), SIGSTOP))
		printf("%d: failed to kill myself\n", getpid());

	printf("Continue after SIGSTOP.. Urr what?\n");
	_exit(1);
}

int main(int argc, char **argv)
{
	char stack[PAGE_SIZE];
	int flags = CLONE_VM | CLONE_FILES | CLONE_UNTRACED | SIGCHLD;
	int ret = -1;
	pid_t child;
	int stat;

	child = clone(bug_child, stack + ARRAY_SIZE(stack), flags, 0);
	if (child < 0) {
		printf("%s(): failed to clone()", __func__);
		return -1;
	}

	if (waitpid(child, &stat, WUNTRACED) != child) {
		/*
		 * waitpid() may end with ECHILD if SIGCHLD == SIG_IGN,
		 * and the child has stopped already.
		 */
		printf("Failed to wait for %s() test", __func__);
		goto out_kill;
	}

	if (!WIFSTOPPED(stat)) {
		printf("Born child is unstoppable! (might be dead)\n");
		goto out_kill;
	}

	ret = 0;

out_kill:
	if (kill(child, SIGKILL))
		printf("Failed to kill my own child");
	if (waitpid(child, &stat, 0) < 0)
		printf("Failed wait for a dead child");

	return ret;
}

Try to compile this as a separate program by gcc -o checkme checkme.c and run ./checkme.

@ImreSzebelledi
Copy link
Author

I had the time today to build the kernel with CONFIG_VETH=y and also managed to build criu 3.16.1-on it.
As you have predicted all the VETH related error vent away, but the SIGSTOP error persisted. Unfortunately I won't have acces to the hardware during the weekend to try the reproducer but will do so on Monday. Thank you so much for all the help!

@github-actions
Copy link

github-actions bot commented Jun 8, 2022

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants