program, btf: improve debuggability when CO-RE or kfunc fixup fails #1402

lmb · 2024-03-27T15:38:16Z

It would be nice if it was clearer to users that a program load failed due to CO-RE poisoning or a missing kfunc. This PR adds to mechanisms:

A heuristic which parses the verifier log for well-known strings to replace EINVAL, EACCES errors.
Code which adds line info to instructions that have been poisoned by CO-RE.

When implementing option 1 I realised how similar kfunc and CO-RE fixups are, but that they are split across ebpf and btf package. Maybe kfuncs should be part of CO-RE?

For option 2, I'm not entirely convinced it's a good idea. There are a bunch of corner cases, for example we currently override any existing line info from the ELF. This means we may lose some information actually!

program: remove fallthrough from error heuristics

We have a bunch of heuristics that try to give a better error when loading a
program fails. It's kind of tricky to decide which order they should be in
and what interactions they may have. Commit 148c76c refactored the check for
missing bpf2bpf call support into a switch with a fallthrough statement. As
a result, we apply all of our EINVAL heuristics to EPERM as well, even
though the initial intent was to only only share the bpf2bpf code.

Fix this by pulling the heuristic into it's own code block again/ The check
is kind of expensive since it iterates all instructions in the worst case so
we also move it to the very end.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

program: return better errors on CO-RE or kfunc fixup failures

CO-RE relocations and kfunc fixups can both fail for a number of reasons.
However, we can't return an error outright, since a program may be written
in a style that takes this into account.

Instead, we rewrite the BPF instruction with the failing relocation or fixup
to a call instruction to a magic number. This causes the verifier to bail
out if the program does end up trying to execute the
"invalid" instruction.

Add a heuristic which detects these cases from the verifier log and returns
a more intuitive error instead.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

btf: add a workaround for the "No type found" error

The kernel accepts line info for each instruction in a program, which is
output as part of the verifier log and therefore provides very important
context. The strings for each line info are interned into a BTF blob.

Unfortunately, the kernel refuses to load BTF with a string table but 
without any types. This means it's not possible to add line info to a
program which doesn't use BTF otherwise.

Add a MarshalOption which fixes this behaviour by adding a dummy type if
necessary.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

btf: use line info to annotate poisoned CO-RE relocations

Add a line info when poisoning an instruction to make the action explicit in
the verifier log. Given a program like:

    ; return bpf_core_enum_value(enum nonexist_enum, NON_EXIST);
   0: LdImmDW dst: r0 imm: 1
   2: Exit

The verifier output now is:

    0: R1=ctx(off=0,imm=0) R10=fp0
   ; instruction poisoned by CO-RE
   0: (18) r10 = 0xbad2310
   frame pointer is read only

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

dylandreimerink

All seems sane to me

dylandreimerink · 2024-04-02T12:45:24Z

When implementing option 1 I realised how similar kfunc and CO-RE fixups are, but that they are split across ebpf and btf package. Maybe kfuncs should be part of CO-RE?

Hmm, you might be onto something there. It initially feels more like map relocation, but for kfuncs we have to search vmlinux and/or modules, match BTF and use BTF ids, which indeed has a lot of overlap with CO-RE.

For option 2, I'm not entirely convinced it's a good idea. There are a bunch of corner cases, for example we currently override any existing line info from the ELF. This means we may lose some information actually!

The poison instructions overwrite the original once, so we were not preserving the line info anyway. We could always consider doing so anyway, adding the old source line to the end like ; instruction poisoned by CO-RE: return bpf_core_enum_value(enum nonexist_enum, NON_EXIST);

We have a bunch of heuristics that try to give a better error when loading a program fails. It's kind of tricky to decide which order they should be in and what interactions they may have. Commit 148c76c refactored the check for missing bpf2bpf call support into a switch with a fallthrough statement. As a result, we apply all of our EINVAL heuristics to EPERM as well, even though the initial intent was to only only share the bpf2bpf code. Fix this by pulling the heuristic into it's own code block again/ The check is kind of expensive since it iterates all instructions in the worst case so we also move it to the very end. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

CO-RE relocations and kfunc fixups can both fail for a number of reasons. However, we can't return an error outright, since a program may be written in a style that takes this into account. Instead, we rewrite the BPF instruction with the failing relocation or fixup to a call instruction to a magic number. This causes the verifier to bail out if the program does end up trying to execute the "invalid" instruction. Add a heuristic which detects these cases from the verifier log and returns a more intuitive error instead. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

The kernel accepts line info for each instruction in a program, which is output as part of the verifier log and therefore provides very important context. The strings for each line info are interned into a BTF blob. Unfortunately, the kernel refuses to load BTF with a string table but without any types. This means it's not possible to add line info to a program which doesn't use BTF otherwise. Add a MarshalOption which fixes this behaviour by adding a dummy type if necessary. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

Add a line info when poisoning an instruction to make the action explicit in the verifier log. Given a program like: ; return bpf_core_enum_value(enum nonexist_enum, NON_EXIST); 0: LdImmDW dst: r0 imm: 1 2: Exit The verifier output now is: 0: R1=ctx(off=0,imm=0) R10=fp0 ; instruction poisoned by CO-RE 0: (18) r10 = 0xbad2310 frame pointer is read only Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

lmb · 2024-04-03T08:41:37Z

Changed the PR so that original Source is retained, thanks!

lmb force-pushed the btf-core-poison-line-info branch 5 times, most recently from c7c7d34 to 44982d3 Compare March 28, 2024 15:36

lmb marked this pull request as ready for review April 2, 2024 08:38

lmb requested review from dylandreimerink and a team as code owners April 2, 2024 08:38

dylandreimerink approved these changes Apr 2, 2024

View reviewed changes

lmb added 4 commits April 3, 2024 09:40

lmb force-pushed the btf-core-poison-line-info branch from 44982d3 to 125ee42 Compare April 3, 2024 08:40

lmb merged commit 5a7f946 into cilium:main Apr 3, 2024
15 checks passed

lmb deleted the btf-core-poison-line-info branch April 3, 2024 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

program, btf: improve debuggability when CO-RE or kfunc fixup fails #1402

program, btf: improve debuggability when CO-RE or kfunc fixup fails #1402

lmb commented Mar 27, 2024

dylandreimerink left a comment

dylandreimerink commented Apr 2, 2024

lmb commented Apr 3, 2024

program, btf: improve debuggability when CO-RE or kfunc fixup fails #1402

program, btf: improve debuggability when CO-RE or kfunc fixup fails #1402

Conversation

lmb commented Mar 27, 2024

dylandreimerink left a comment

Choose a reason for hiding this comment

dylandreimerink commented Apr 2, 2024

lmb commented Apr 3, 2024