Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remediation for verifier regression in kernels ~6.8 and greater #26

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

javierhonduco
Copy link
Owner

Some days ago we put a fix for arm64 kernels as a verifier regression was introduced causing loading errors
commit.

Turns out this is not just affecting arm64, as it just happened it's that my arm64 VM was running a more up to date kernel. This regression affects x86 as well.

This commit changes the remediation to unrolloing the binary search loop and adds a kernel regression test. I haven't measured the performance implication of doing this vs the previous remediation. This is out of the scope of this PR and it's something I will work on later on.

When I have spare cycles I should report this upstream.

The error

292: (bf) r6 = r3                     ; R3_w=scalar(id=52209,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R6_w=scalar(id=52209,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff))
293: (bf) r3 = r2                     ; R2_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R3_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff))
294: (2d) if r4 > r8 goto pc+1 296: R0_w=scalar() R1=11 R2_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R3_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R4_w=scalar(umin=9,umax=0xfffffffffffffffa) R5_w=scalar() R6_w=scalar(id=52209,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R7=scalar(id=52027,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R8=scalar(id=24520,umin=8,umax=0xfffffffffffffff9) R9_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=0xff0000,var_off=(0x0; 0xff0000)) R10=fp0 fp-16=mmmmmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=map_value(map=unwind_tables,ks=8,vs=1400000) fp-56=6 fp-64=scalar(id=52027,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) fp-72=map_value(map=heap,ks=4,vs=1080) fp-80=map_value(map=heap,ks=4,vs=1080) fp-88=map_value(map=exec_mappings,ks=16,vs=32) fp-96=mmmmmmmm fp-104=ctx() fp-112=mmmmmmmm fp-120=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) fp-128=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) fp-136=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
; if (table->rows[mid].pc <= pc) {
296: (2d) if r4 > r8 goto pc-83 214: R0_w=scalar() R1=11 R2_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R3_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R4_w=scalar(umin=9,umax=0xfffffffffffffffa) R5_w=scalar() R6_w=scalar(id=52209,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R7=scalar(id=52027,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R8=scalar(id=24520,umin=8,umax=0xfffffffffffffff9) R9_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=0xff0000,var_off=(0x0; 0xff0000)) R10=fp0 fp-16=mmmmmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=map_value(map=unwind_tables,ks=8,vs=1400000) fp-56=6 fp-64=scalar(id=52027,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) fp-72=map_value(map=heap,ks=4,vs=1080) fp-80=map_value(map=heap,ks=4,vs=1080) fp-88=map_value(map=exec_mappings,ks=16,vs=32) fp-96=mmmmmmmm fp-104=ctx() fp-112=mmmmmmmm fp-120=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) fp-128=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) fp-136=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
; LOG("========== left %llu right %llu (shard index %d)", left, right,
214: (18) r9 = 0xbadfadbadfadbad      ; R9_w=0xbadfadbadfadbad
; for (int i = 0; i < MAX_BINARY_SEARCH_DEPTH; i++) {
216: (07) r1 += -1                    ; R1_w=10
217: (bf) r2 = r1                     ; R1_w=10 R2_w=10
218: (67) r2 <<= 32
BPF program is too large. Processed 1000001 insn
processed 1000001 insns (limit 1000000) max_states_per_insn 29 total_states 18306 peak_states 2761 mark_read 106
-- END PROG LOAD LOG --
libbpf: prog 'dwarf_unwind': failed to load: -7
libbpf: failed to load object 'profiler_bpf'
libbpf: failed to load BPF skeleton 'profiler_bpf': -7
thread 'main' panicked at src/profiler.rs:159:36:
load skel: Error: Argument list too long (os error 7)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Test Plan

Added a regression test.

Some days ago we put a fix for arm64 kernels as a verifier regression
was introduced causing loading errors
[commit](a8094d1).

Turns out this is not just affecting arm64, as it just happened it's
that my arm64 VM was running a more up to date kernel. This regression
affects x86 as well.

This commit changes the remediation to unrolloing the binary search loop
and adds a kernel regression test. I haven't measured the performance
implication of doing this vs the previous remediation. This is out of
the scope of this PR and it's something I will work on later on.

When I have spare cycles I should report this upstream.

The error
=========

```
292: (bf) r6 = r3                     ; R3_w=scalar(id=52209,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R6_w=scalar(id=52209,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff))
293: (bf) r3 = r2                     ; R2_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R3_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff))
294: (2d) if r4 > r8 goto pc+1 296: R0_w=scalar() R1=11 R2_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R3_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R4_w=scalar(umin=9,umax=0xfffffffffffffffa) R5_w=scalar() R6_w=scalar(id=52209,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R7=scalar(id=52027,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R8=scalar(id=24520,umin=8,umax=0xfffffffffffffff9) R9_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=0xff0000,var_off=(0x0; 0xff0000)) R10=fp0 fp-16=mmmmmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=map_value(map=unwind_tables,ks=8,vs=1400000) fp-56=6 fp-64=scalar(id=52027,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) fp-72=map_value(map=heap,ks=4,vs=1080) fp-80=map_value(map=heap,ks=4,vs=1080) fp-88=map_value(map=exec_mappings,ks=16,vs=32) fp-96=mmmmmmmm fp-104=ctx() fp-112=mmmmmmmm fp-120=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) fp-128=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) fp-136=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
; if (table->rows[mid].pc <= pc) {
296: (2d) if r4 > r8 goto pc-83 214: R0_w=scalar() R1=11 R2_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R3_w=scalar(id=52208,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=0x1869d,var_off=(0x0; 0x1ffff)) R4_w=scalar(umin=9,umax=0xfffffffffffffffa) R5_w=scalar() R6_w=scalar(id=52209,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R7=scalar(id=52027,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) R8=scalar(id=24520,umin=8,umax=0xfffffffffffffff9) R9_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=0xff0000,var_off=(0x0; 0xff0000)) R10=fp0 fp-16=mmmmmmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=map_value(map=unwind_tables,ks=8,vs=1400000) fp-56=6 fp-64=scalar(id=52027,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x1869e,var_off=(0x0; 0x1ffff)) fp-72=map_value(map=heap,ks=4,vs=1080) fp-80=map_value(map=heap,ks=4,vs=1080) fp-88=map_value(map=exec_mappings,ks=16,vs=32) fp-96=mmmmmmmm fp-104=ctx() fp-112=mmmmmmmm fp-120=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) fp-128=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) fp-136=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
; LOG("========== left %llu right %llu (shard index %d)", left, right,
214: (18) r9 = 0xbadfadbadfadbad      ; R9_w=0xbadfadbadfadbad
; for (int i = 0; i < MAX_BINARY_SEARCH_DEPTH; i++) {
216: (07) r1 += -1                    ; R1_w=10
217: (bf) r2 = r1                     ; R1_w=10 R2_w=10
218: (67) r2 <<= 32
BPF program is too large. Processed 1000001 insn
processed 1000001 insns (limit 1000000) max_states_per_insn 29 total_states 18306 peak_states 2761 mark_read 106
-- END PROG LOAD LOG --
libbpf: prog 'dwarf_unwind': failed to load: -7
libbpf: failed to load object 'profiler_bpf'
libbpf: failed to load BPF skeleton 'profiler_bpf': -7
thread 'main' panicked at src/profiler.rs:159:36:
load skel: Error: Argument list too long (os error 7)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

Test Plan
=========

Added a regression test.
@javierhonduco javierhonduco merged commit a932078 into main Apr 29, 2024
6 checks passed
@javierhonduco javierhonduco deleted the fix-kernels-6.8 branch April 29, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant