Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

program, btf: improve debuggability when CO-RE or kfunc fixup fails #1402

Merged
merged 4 commits into from
Apr 3, 2024

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented Mar 27, 2024

It would be nice if it was clearer to users that a program load failed due to CO-RE poisoning or a missing kfunc. This PR adds to mechanisms:

  1. A heuristic which parses the verifier log for well-known strings to replace EINVAL, EACCES errors.
  2. Code which adds line info to instructions that have been poisoned by CO-RE.

When implementing option 1 I realised how similar kfunc and CO-RE fixups are, but that they are split across ebpf and btf package. Maybe kfuncs should be part of CO-RE?

For option 2, I'm not entirely convinced it's a good idea. There are a bunch of corner cases, for example we currently override any existing line info from the ELF. This means we may lose some information actually!


program: remove fallthrough from error heuristics

We have a bunch of heuristics that try to give a better error when loading a
program fails. It's kind of tricky to decide which order they should be in
and what interactions they may have. Commit 148c76c refactored the check for
missing bpf2bpf call support into a switch with a fallthrough statement. As
a result, we apply all of our EINVAL heuristics to EPERM as well, even
though the initial intent was to only only share the bpf2bpf code.

Fix this by pulling the heuristic into it's own code block again/ The check
is kind of expensive since it iterates all instructions in the worst case so
we also move it to the very end.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

program: return better errors on CO-RE or kfunc fixup failures

CO-RE relocations and kfunc fixups can both fail for a number of reasons.
However, we can't return an error outright, since a program may be written
in a style that takes this into account.

Instead, we rewrite the BPF instruction with the failing relocation or fixup
to a call instruction to a magic number. This causes the verifier to bail
out if the program does end up trying to execute the
"invalid" instruction.

Add a heuristic which detects these cases from the verifier log and returns
a more intuitive error instead.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

btf: add a workaround for the "No type found" error

The kernel accepts line info for each instruction in a program, which is
output as part of the verifier log and therefore provides very important
context. The strings for each line info are interned into a BTF blob.

Unfortunately, the kernel refuses to load BTF with a string table but 
without any types. This means it's not possible to add line info to a
program which doesn't use BTF otherwise.

Add a MarshalOption which fixes this behaviour by adding a dummy type if
necessary.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

btf: use line info to annotate poisoned CO-RE relocations

Add a line info when poisoning an instruction to make the action explicit in
the verifier log. Given a program like:

    ; return bpf_core_enum_value(enum nonexist_enum, NON_EXIST);
   0: LdImmDW dst: r0 imm: 1
   2: Exit

The verifier output now is:

    0: R1=ctx(off=0,imm=0) R10=fp0
   ; instruction poisoned by CO-RE
   0: (18) r10 = 0xbad2310
   frame pointer is read only

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

@lmb lmb force-pushed the btf-core-poison-line-info branch 5 times, most recently from c7c7d34 to 44982d3 Compare March 28, 2024 15:36
@lmb lmb marked this pull request as ready for review April 2, 2024 08:38
@lmb lmb requested review from dylandreimerink and a team as code owners April 2, 2024 08:38
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All seems sane to me

@dylandreimerink
Copy link
Member

When implementing option 1 I realised how similar kfunc and CO-RE fixups are, but that they are split across ebpf and btf package. Maybe kfuncs should be part of CO-RE?

Hmm, you might be onto something there. It initially feels more like map relocation, but for kfuncs we have to search vmlinux and/or modules, match BTF and use BTF ids, which indeed has a lot of overlap with CO-RE.

For option 2, I'm not entirely convinced it's a good idea. There are a bunch of corner cases, for example we currently override any existing line info from the ELF. This means we may lose some information actually!

The poison instructions overwrite the original once, so we were not preserving the line info anyway. We could always consider doing so anyway, adding the old source line to the end like ; instruction poisoned by CO-RE: return bpf_core_enum_value(enum nonexist_enum, NON_EXIST);

lmb added 4 commits April 3, 2024 09:40
We have a bunch of heuristics that try to give a better error when
loading a program fails. It's kind of tricky to decide which order
they should be in and what interactions they may have. Commit
148c76c refactored the check for missing bpf2bpf call support into a
switch with a fallthrough statement. As a result, we apply all of
our EINVAL heuristics to EPERM as well, even though the initial
intent was to only only share the bpf2bpf code.

Fix this by pulling the heuristic into it's own code block again/
The check is kind of expensive since it iterates all instructions
in the worst case so we also move it to the very end.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
CO-RE relocations and kfunc fixups can both fail for a number of
reasons. However, we can't return an error outright, since a program
may be written in a style that takes this into account.

Instead, we rewrite the BPF instruction with the failing relocation
or fixup to a call instruction to a magic number. This causes the
verifier to bail out if the program does end up trying to execute the
"invalid" instruction.

Add a heuristic which detects these cases from the verifier log and
returns a more intuitive error instead.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
The kernel accepts line info for each instruction in a program,
which is output as part of the verifier log and therefore provides
very important context. The strings for each line info are interned
into a BTF blob.

Unfortunately, the kernel refuses to load BTF with a string table but
without any types. This means it's not possible to add line info to
a program which doesn't use BTF otherwise.

Add a MarshalOption which fixes this behaviour by adding a dummy type
if necessary.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Add a line info when poisoning an instruction to make the
action explicit in the verifier log. Given a program like:

    ; return bpf_core_enum_value(enum nonexist_enum, NON_EXIST);
    0: LdImmDW dst: r0 imm: 1
    2: Exit

The verifier output now is:

    0: R1=ctx(off=0,imm=0) R10=fp0
    ; instruction poisoned by CO-RE
    0: (18) r10 = 0xbad2310
    frame pointer is read only

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb lmb force-pushed the btf-core-poison-line-info branch from 44982d3 to 125ee42 Compare April 3, 2024 08:40
@lmb
Copy link
Collaborator Author

lmb commented Apr 3, 2024

Changed the PR so that original Source is retained, thanks!

@lmb lmb merged commit 5a7f946 into cilium:main Apr 3, 2024
15 checks passed
@lmb lmb deleted the btf-core-poison-line-info branch April 3, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants