Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

features: add HaveV4ISA probe for checking ISA v4 support in the kernel #1608

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

smagnani96
Copy link
Contributor

This PR introduces the haveV4ISA probe and HaveV4ISA API to check in the running kernel if instructions of the v4 ISA are supported. The upstream commit used as reference is 1f9a1ea821ff ("bpf: Support new sign-extension load insns"). The probes tests the new long jump given by BPF_JMP32 | BPF_JA.

I'm attempting to submit an identical patch to bpftool, so that bpftool feature probe will output also ISAv4 support.

@smagnani96
Copy link
Contributor Author

smagnani96 commented Dec 6, 2024

CI failing on arm64 with bpf_jit: unknown opcode 06.

Edit:
Thanks Dylan to the support in the Slack channel https://cilium.slack.com/archives/C4XCTGYEM/p1733744443171859.
The ubuntu-22.04-arm64 GH runner runs on kernel v6.5, while the ISA v4 instruction have been introduced in kernel v6.6. Therefore, we either need to update the GH runner or run VM tests (similarly as for x86).

@smagnani96 smagnani96 force-pushed the pr/probe-have-v4-isa branch from bb8c36f to f6c9a05 Compare December 9, 2024 12:18
@smagnani96
Copy link
Contributor Author

Ubuntu 24.04 runner for arm64 has been setup for the organization and the CI now works.
Opening for review 🎊

@smagnani96 smagnani96 marked this pull request as ready for review December 9, 2024 15:59
@smagnani96 smagnani96 requested review from rgo3 and a team as code owners December 9, 2024 15:59
Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failing on arm64 with bpf_jit: unknown opcode 06.

Edit: Thanks Dylan to the support in the Slack channel https://cilium.slack.com/archives/C4XCTGYEM/p1733744443171859. The ubuntu-22.04-arm64 GH runner runs on kernel v6.5, while the ISA v4 instruction have been introduced in kernel v6.6. Therefore, we either need to update the GH runner or run VM tests (similarly as for x86).

This fixes CI, but not the test. TestHaveV4ISA should return ebpf.ErrNotSupported on kernels <6.6, it looks like probeProgram receives EOPNOTSUP or ENOTSUPP from the verifier instead of the typical EINVAL when hitting unknown instructions on arm64. Please investigate the source of that deviation. This test should automatically skip on arm64 version 6.4 and succeed on 6.8.

@smagnani96 smagnani96 force-pushed the pr/probe-have-v4-isa branch from f6c9a05 to 7aa1e62 Compare December 9, 2024 19:10
@smagnani96
Copy link
Contributor Author

smagnani96 commented Dec 9, 2024

This fixes CI, but not the test. TestHaveV4ISA should return ebpf.ErrNotSupported on kernels <6.6, it looks like probeProgram receives EOPNOTSUP or ENOTSUPP from the verifier instead of the typical EINVAL when hitting unknown instructions on arm64. Please investigate the source of that deviation. This test should automatically skip on arm64 version 6.4 and succeed on 6.8.

Thanks Timo for the feedback 🙏
I verified that in this case the error code returned is sys.ENOTSUPP.

I'm adding an intermediate commit to catch this in https://github.com/cilium/ebpf/blob/main/features/prog.go#L47, what do you think?
I changed just the probe to verify the returned error, and turn it into ebpf.ErrNotSupported in case of a sys.ENOTSUPP. That error is returned on arm64 kernels prior to ISA v4 support (<=v6.6), I was able to reproduce with an EC2 Amazon Linux 2023 (kernel v6.1).

@smagnani96 smagnani96 force-pushed the pr/probe-have-v4-isa branch from 7aa1e62 to 0f37adb Compare December 9, 2024 23:41
@ti-mo
Copy link
Collaborator

ti-mo commented Dec 10, 2024

I'm adding an intermediate commit to catch this in https://github.com/cilium/ebpf/blob/main/features/prog.go#L47, what do you think? I changed just the probe to verify the returned error, and turn it into ebpf.ErrNotSupported in case of a sys.ENOTSUPP. That error is returned on arm64 kernels prior to ISA v4 support (<=v6.6), I was able to reproduce with an EC2 Amazon Linux 2023 (kernel v6.1).

Thanks! How come the behaviour is different between x64 and arm64?

@smagnani96
Copy link
Contributor Author

Thanks! How come the behaviour is different between x64 and arm64?

Can't precisely tell the root source yet, I'm still trying to understand differences between codepath in x86 and arm64.
Thanks to @rgo3 for providing ideas, let me report our so-far analysis here.

On arm64, we can find the bpf_jit: invalid opcode error at build_insn() https://elixir.bootlin.com/linux/v6.4.16/source/arch/arm64/net/bpf_jit_comp.c#L1302-L1307.

default:
		pr_err_once("unknown opcode %02x\n", code);
		return -EINVAL;
	}

That being sad, tracking back the call to build_insns, the returned -EINVAL can get changed into ENOTSUPP here https://elixir.bootlin.com/linux/v6.4.16/source/kernel/bpf/verifier.c#L17544-L17548:

func[i] = bpf_int_jit_compile(func[i]);
if (!func[i]->jited) {
	err = -ENOTSUPP;
	goto out_free;
}

However, on x86, there's a similar same check for unknown BPF instructions https://elixir.bootlin.com/linux/v6.4.16/source/arch/x86/net/bpf_jit_comp.c#L1815-L1823:

default:
	/*
	 * By design x86-64 JIT should support all BPF instructions.
	 * This error will be seen if new instruction was added
	 * to the interpreter, but not to the JIT, or if there is
	 * junk in bpf_prog.
	 */
	pr_err("bpf_jit: unknown opcode %02x\n", insn->code);
	return -EINVAL;
}

So that's why there's still confusion on why the two returned error codes differ (ENOTSUPP on arm64, EINVAL on x86).
The function bpf_int_jit_compile differs between arm64 and x86 both in structure and concept. For instance, I'm seeing that for x86 the JIT does multiple passes (#define MAX_PASSES 20) while for arm64 it doesn't.

@smagnani96
Copy link
Contributor Author

[offtopic]: let me know if we want to drop 1st commit (upgrade ubuntu arm64 runner) in this PR.

@ti-mo
Copy link
Collaborator

ti-mo commented Dec 10, 2024

[offtopic]: let me know if we want to drop 1st commit (upgrade ubuntu arm64 runner) in this PR.

Yes please, let's bump it separately. We can bump all runners in lockstep, also x86.

Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating! One nit and it's good to go.

features/misc.go Show resolved Hide resolved
@borkmann
Copy link
Member

If JIT fails, it should always return -ENOTSUPP. These are the locations:

The -EINVAL is JIT internal and does not get propagated outside of JIT. JIT will just return un-JITed program back to the verifier/core and then depending on whether interpreter is compiled out or not this gets propagated to the user.

@smagnani96 smagnani96 force-pushed the pr/probe-have-v4-isa branch 2 times, most recently from 28819a1 to 6da9a71 Compare December 11, 2024 08:42
@smagnani96
Copy link
Contributor Author

smagnani96 commented Dec 11, 2024

If JIT fails, it should always return -ENOTSUPP. These are the locations:

The -EINVAL is JIT internal and does not get propagated outside of JIT. JIT will just return un-JITed program back to the verifier/core and then depending on whether interpreter is compiled out or not this gets propagated to the user.

Thanks Daniel. I looked also that codepath yesterday and yet it is strange the following behavior.
Adding a print in our probeProgram https://github.com/cilium/ebpf/blob/main/features/prog.go#L41 leads to the following 🤔

fmt.Println(errors.Is(err, sys.ENOTSUPP),  errors.Is(err, unix.EINVAL))
// on arm64
true false
// on x86
false true

@rgo3
Copy link
Contributor

rgo3 commented Dec 11, 2024

@smagnani96 What is the verifier log when we receive EINVAL on x86? That might help us figuring out where exactly the EINVAL gets produced.

@borkmann
Copy link
Member

@smagnani96 What is the verifier log when we receive EINVAL on x86? That might help us figuring out where exactly the EINVAL gets produced.

This + tracing return codes from the verifier to see where EINVAL gets propagated.

@smagnani96
Copy link
Contributor Author

@smagnani96 What is the verifier log when we receive EINVAL on x86? That might help us figuring out where exactly the EINVAL gets produced.

This + tracing return codes from the verifier to see where EINVAL gets propagated.

Apologize for the mismatch of kernel versions, seems I can't run arm64 VMs with lvh, so I'm using what AWS offers.
Don't fully understand why in this case I see for x86 the same error I was seeing in the dmesg of the arm64 runner #1608 (comment).

However, talking from my little expertise, this seems that x86 doesn't know (yet) that instruction at all (https://elixir.bootlin.com/linux/v6.4.16/source/arch/x86/net/bpf_jit_comp.c#L1815-L1823), while arm64 somehow does.
Might be related to arm64 using fixed-length instruction set?

x86

Linux civil-jennet 6.4.0-060400-generic #202306271339 SMP PREEMPT_DYNAMIC Tue Jun 27 14:26:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Error: invalid argument
Log:

func#0 @0
unknown opcode 06
verification time 5 usec
stack depth 0
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

arm64

Linux XXXXXX 6.1.115-126.197.amzn2023.aarch64 #1 SMP Tue Nov 5 17:36:31 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Error: operation not supported

func#0 @0
0: R1=ctx(off=0,imm=0) R10=fp0
0: (b7) r0 = 0                        ; R0_w=0
1: (15) if r0 == 0x1 goto pc+1
last_idx 1 first_idx 0
regs=1 stack=0 before 0: (b7) r0 = 0
1: R0_w=0
2: (06) if w0 jmp 0x1 goto pc+0
4: (95) exit
verification time 31 usec
stack depth 0
processed 4 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

@borkmann
Copy link
Member

borkmann commented Dec 11, 2024

The error on x86-64 is not from the BPF JIT, but from the verifier instead: https://elixir.bootlin.com/linux/v6.4.16/source/kernel/bpf/verifier.c#L16750

The arm64 one looks like JIT related, did you also check dmesg if there is something printed?

Is there also a 6.1.115-*.amzn2023 kernel for x86? Either the 6.4.0 one had a bug or amzn2023 kernel has seen more backports.

@smagnani96
Copy link
Contributor Author

smagnani96 commented Dec 11, 2024

The error on x86-64 is not from the BPF JIT, but from the verifier instead: https://elixir.bootlin.com/linux/v6.4.16/source/kernel/bpf/verifier.c#L16750

Thank you 🙏

The arm64 one looks like JIT related, did you also check dmesg if there is something printed?

[  223.651188] bpf_jit: unknown opcode 06

Logged only once, repeating the test while also flushing other messages doesn't seem to log this error more times.
(This could be the pr_err_once("unknown opcode %02x\n", code); that in kernel v6.1 should be few lines above, right?)

Is there also a 6.1.115-*.amzn2023 kernel for x86? Either the 6.4.0 one had a bug or amzn2023 kernel has seen more backports.

You're right, amzn2023 has seen more backports 😞

Linux XXXXXXX 6.1.115-126.197.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov 5 17:36:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Error: nil, program correctly compiled

func#0 @0
0: R1=ctx(off=0,imm=0) R10=fp0
0: (b7) r0 = 0                        ; R0_w=0
1: (15) if r0 == 0x1 goto pc+1
last_idx 1 first_idx 0
regs=1 stack=0 before 0: (b7) r0 = 0
1: R0_w=0
2: (06) if w0 jmp 0x1 goto pc+0
4: (95) exit
verification time 44 usec
stack depth 0
processed 4 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Incoming test

The problem with lvh in my machine seems to have disappeared (deleted the folder/kernels/images and re-tried from scratch). This means I can soon try this out both for x86 and arm64 from the same kernel version. I'll keep this updated.

smagnani96 and others added 2 commits December 12, 2024 13:49
This commit introduces the haveV4ISA probe features.HaveV4ISA to check
if the kernel supports instructions of the v4 ISA, introduced in Linux
commit 1f9a1ea821ff ("bpf: Support new sign-extension load insns").

The probe tests the new asm.LongJump insn given by `BPF_JMP32 | BPF_JA`.

On arm64 with kernel prior to ISA v4 support (<= v6.6), this would return
sys.ENOTSUPP: handle this case by returning our ebpf.ErrNotSupported
instead.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
With the addition of the v4 ISA probe, we noticed an ENOTSUPP being returned by
BPF_PROG_LOAD on aarch64. Typically, this is an error code internal to the
verifier/JIT, but on aarch64 it seems to bubble up.

Since we cannot currently test older arm kernels, apply this logic to the older
ISA probes as well.

Signed-off-by: Timo Beckers <timo@isovalent.com>
@ti-mo
Copy link
Collaborator

ti-mo commented Dec 12, 2024

@smagnani96 Sorry for sending you on this goose chase, I didn't realize we relied on ENOTSUPP for probing StructOps. It doesn't seem like a good idea to treat ENOTSUPP as conclusive across the board, and I'm not sure about the filtering logic in probeProgram. It doesn't make things clearer.

Since it's just the ISA probes I'm calling into question, let's add explicit corner cases in each of them. I'll push a commit shortly with these changes and merge if green.

@ti-mo ti-mo force-pushed the pr/probe-have-v4-isa branch from 06f8164 to e987b16 Compare December 12, 2024 12:55
@ti-mo ti-mo changed the title add haveV4ISA probe features: add HaveV4ISA probe for checking ISA v4 support in the kernel Dec 12, 2024
@ti-mo ti-mo merged commit 98ede8a into cilium:main Dec 12, 2024
17 checks passed
@smagnani96
Copy link
Contributor Author

Incoming test

The problem with lvh in my machine seems to have disappeared (deleted the folder/kernels/images and re-tried from
scratch). This means I can soon try this out both for x86 and arm64 from the same kernel version. I'll keep this updated.

Tried many kernels for arm64:

  • for some mainlines (ex 6.1-main) I think the patch was backported, thus I'm not receiving error at all;
  • downloaded and built 6.4.y (v6.4.16), getting EINVAL
  • downloaded and built 6.0.y (v6.0.19), getting EINVAL

Ubuntu runner had v6.5.0-1025-azure, the EC2 instance had v6.1.115-126.197.amzn2023.aarch64.

@smagnani96
Copy link
Contributor Author

@smagnani96 Sorry for sending you on this goose chase, I didn't realize we relied on ENOTSUPP for probing StructOps. It doesn't seem like a good idea to treat ENOTSUPP as conclusive across the board, and I'm not sure about the filtering logic in probeProgram. It doesn't make things clearer.
Since it's just the ISA probes I'm calling into question, let's add explicit corner cases in each of them. I'll push a commit shortly with these changes and merge if green.

It's fine, I learned a lot in this issue 😉
Thanks for the feedback again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants