Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce builtin to access percpu kernel data #3596

Merged
merged 3 commits into from
Nov 22, 2024

Conversation

viktormalik
Copy link
Contributor

Introduce new builtin percpu_kaddr to access kernel percpu variables. It leverages the bpf_per_cpu_ptr helper which accepts a pointer to a percpu ksym and a CPU number. The ksym is an extern variable with a BTF entry matching the BTF of the corresponding kernel variable. Since we now use libbpf to do the loading, it is sufficient to emit a global variable declaration with a proper BTF (debug info) and libbpf will take care of the rest.

The helper has two forms:

percpu_kaddr("symbol_name")
percpu_kaddr("symbol_name", N)

where N is the CPU number. The former variant retrieves the value for the current CPU.

For instance, this allows to access the process_counts percpu variable:

$ bpftrace -e 'BEGIN { printf("%d processes running on CPU %d\n", *percpu_kaddr("process_counts"), cpu); exit() }'
Attaching 1 probe...
88 processes running on CPU 1

A tricky part is that bpf_per_cpu_ptr may return NULL if the supplied CPU number is higher than the number of the CPUs. The BPF program should perform a NULL-check on the returned value, otherwise it is rejected by the verifier. In practice, this only happens if pointer arithmetics is used (i.e. a struct field is accessed). Since it is quite complex to detect a missing NULL-check in bpftrace, we instead let verifier do it and just display the potential verifier error in a nicer manner:

$ bpftrace -e 'BEGIN { printf("%d\n", ((struct rq *)percpu_kaddr("runqueues"))->nr_running); exit() }'
Attaching 1 probe...
stdin:1:24-64: ERROR: helper bpf_per_cpu_ptr: result needs to be null-checked before accessing fields
BEGIN { printf("%d\n", ((struct rq *)percpu_kaddr("runqueues"))->nr_running); exit() }
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Resolves #1837.

Checklist
  • Language changes are updated in man/adoc/bpftrace.adoc
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

When emitting debug info for global variables, emit real types instead
of types that are internally used in codegen. This mainly applies to:
- pointers which are internally represented as integers
- structs which are internally represented as byte arrays

Similarly to kernel functions, we will need to emit debug info for
global variables that matches their BTF in the kernel so we need the
actual types.
When emitting debug info for structs (which will be translated into
BTF), the name *must not* contain the "struct" (or "union") prefix,
otherwise our emitted debug info will not match the kernel BTF (which
doesn't use the prefix) and the programs where the match is required
will be rejected.
src/btf.cpp Fixed Show fixed Hide fixed
@ajor
Copy link
Member

ajor commented Nov 19, 2024

Since I don't generally like adding new builtins, I've got to ask: what are the benefits of this over reading from the kernel PERCPU data manually, a la: https://github.com/sched-ext/scx/blob/912d6e01c1538cb5ec7e6363332caef66f7765ba/scheds/rust/scx_layered/integration/layer_llc.bt#L22-L27

No bpftrace changes required and with user defined functions, the above approach could be wrapped in a simple stdlib function to make it easier to use.

@viktormalik
Copy link
Contributor Author

Since I don't generally like adding new builtins, I've got to ask: what are the benefits of this over reading from the kernel PERCPU data manually, a la: https://github.com/sched-ext/scx/blob/912d6e01c1538cb5ec7e6363332caef66f7765ba/scheds/rust/scx_layered/integration/layer_llc.bt#L22-L27

No bpftrace changes requires and with user defined functions, the above approach could be wrapped in a simple stdlib function to make it easier to use.

Mostly convenience, at the moment. It will take a while until we get user-defined functions and a standard library and writing this manually is much more code.

On top of that, we get some verifier checks for free, e.g. the mentioned check that users cannot do pointer arithmetic on the returned value without a prior null-check. For stdlib bpftrace function, we'd have to define those checks on our own.

That being said, I understand you reluctance to add new helpers but I feel like the standard library is still quite some time away and, in the meantime, we'll need to address some real-world use-cases which require new builtins. I'm wondering if we could establish a plan that once the standard library is in place, we will remove some (potentially many) builtin functions in favor of stdlib functions? That would allow us to add builtins for the time being but we would also be transparent about the fact that they will go away from the language eventually.

@ajor
Copy link
Member

ajor commented Nov 19, 2024

I'm wondering if we could establish a plan that once the standard library is in place, we will remove some (potentially many) builtin functions in favor of stdlib functions? That would allow us to add builtins for the time being but we would also be transparent about the fact that they will go away from the language eventually.

I can get behind this 👍

@jordalgo
Copy link
Contributor

I'm wondering if we could establish a plan that once the standard library is in place, we will remove some (potentially many) builtin functions in favor of stdlib functions?

Is it possible for this to be seamless for users? Maybe we could detect what "builtin" functions are being used and automatically include them from the standard library (if there are no conflicts with other user-defined functions)

@viktormalik
Copy link
Contributor Author

I'm wondering if we could establish a plan that once the standard library is in place, we will remove some (potentially many) builtin functions in favor of stdlib functions?

Is it possible for this to be seamless for users? Maybe we could detect what "builtin" functions are being used and automatically include them from the standard library (if there are no conflicts with other user-defined functions)

Yeah, something like that. It very much depends on the include model that we end up with but I'm sure we'll be able to find a solution which is transparent for users.

Comment on lines +2144 to +2160
CallInst *IRBuilderBPF::CreatePerCpuPtr(Value *var,
Value *cpu,
const location &loc)
{
// void *bpf_per_cpu_ptr(const void *percpu_ptr, u32 cpu)
// Return:
// A pointer pointing to the kernel percpu variable on
// cpu, or NULL, if cpu is invalid.
FunctionType *percpuptr_func_type = FunctionType::get(
GET_PTR_TY(), { GET_PTR_TY(), getInt64Ty() }, false);
return CreateHelperCall(libbpf::BPF_FUNC_per_cpu_ptr,
percpuptr_func_type,
{ var, cpu },
"per_cpu_ptr",
&loc);
}

Copy link
Member

@danobi danobi Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the bpf_this_cpu_ptr() variant can never return null according to docs. Given that user needs to do null checks if accessing fields, probably worthwhile to use the this variant when possible. So less null checks necessary. WDYT?

https://github.com/torvalds/linux/blob/43fb83c17ba2d63dfb798f0be7453ed55ca3f9c2/tools/include/uapi/linux/bpf.h#L4937-L4947

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, absolutely, I missed that one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can bake this into a semantic_analyser check. It's a bit confusing to users that they have to null check one variant and not the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy, I think. First, I wouldn't put it to semantic analyser but rather to a separate pass. And there, you'd have to introduce a context-sensitive analysis - you'd have to check that all field accesses of the pointer returned by the helper are done within a context where the pointer is not equal to NULL. I can't see a simple way of doing that.

Alternatively, we could inject the check ourselves into the AST or in codegen. But then, what would be the value for the other branch? Zero? That's a valid value for some fields so we would basically give users incorrect values.

The good thing here is that bpftrace will immediately inform you about the missing check so users don't need to remember anything, just add the check when the tool requires it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good thing here is that bpftrace will immediately inform you about the missing check so users don't need to remember anything, just add the check when the tool requires it.

Yeah that's true. Maybe just update the adoc to say you don't have to null check when you are not supplying the cpu in the second arg.

Copy link
Member

@danobi danobi Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something we can improve on in the verifier. If the CPU number is statically known and in the range of valid CPUs, verifier should be able to let user omit null check. cc @4ast

Comment on lines 4728 to 4730
std::string err;
auto type = bpftrace_.btf_->get_var_type(var_name);
if (type.IsNoneTy()) {
throw FatalUserException("Could not resolve variable \"" + var_name +
"\" from BTF");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done in semantic analyser so there's highlighting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test missing symbol also?

man/adoc/bpftrace.adoc Outdated Show resolved Hide resolved
man/adoc/bpftrace.adoc Outdated Show resolved Hide resolved
src/ast/dibuilderbpf.cpp Outdated Show resolved Hide resolved
man/adoc/bpftrace.adoc Outdated Show resolved Hide resolved
man/adoc/bpftrace.adoc Outdated Show resolved Hide resolved
man/adoc/bpftrace.adoc Outdated Show resolved Hide resolved
src/ast/irbuilderbpf.cpp Outdated Show resolved Hide resolved
src/ast/passes/codegen_llvm.cpp Show resolved Hide resolved
@@ -1366,6 +1366,24 @@ TEST(semantic_analyser, call_kaddr)
test("kprobe:f { kaddr(123); }", 1);
}

TEST(semantic_analyser, call_percpu_kaddr)
{
test("kprobe:f { percpu_kaddr(\"process_counts\"); }");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is process_counts from host? I don't see it in btf data source

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's from the host. It exists on every Linux system but it's still probably better to have it in data_source.

There exist "percpu" global variables in the kernel which contain a
distinct value for each CPU. In BPF, access to these variables is done
via the bpf_per_cpu_ptr and bpf_this_cpu_ptr helpers. Both accept a
pointer to a percpu ksym and the former also accepts a CPU number.  The
ksym is an extern variable with a BTF entry matching the BTF of the
corresponding kernel variable. Since we now use libbpf to do the
loading, it is sufficient to emit a global variable declaration with a
proper BTF and libbpf will take care of the rest.

Introduce new bpftrace builtin percpu_kaddr to access the percpu data.
The helper has two forms:

    percpu_kaddr("symbol_name")      <- uses bpf_this_cpu_ptr
    percpu_kaddr("symbol_name", N)   <- uses bpf_per_cpu_ptr

where N is the CPU number. The former variant retrieves the value for
the current CPU.

A tricky part is that bpf_per_cpu_ptr may return NULL if the supplied
CPU number is higher than the number of the CPUs. The BPF program should
perform a NULL-check on the returned value, otherwise it is rejected by
the verifier. In practice, this only happens if pointer arithmetics is
used (i.e. a struct field is accessed). Since it is quite complex to
detect a missing NULL-check in bpftrace, we instead let verifier do it
and just display the potential verifier error in a nicer manner.

The check if the global variable exists is done in semantic analyser to
get better error highlighting. Therefore, for testing the new builtin in
semantic analyser tests, we need to add a symbol (process_counts) into
tests/data/data_source.c to get it into our mock BTF. The problem here
is that pahole places only percpu variables into BTF (and none other) so
the symbol must be in the ".data..percpu" section. To do that, we need
to make data_source.o a relocatable file (using compiler's -c option),
otherwise the linker would put the symbol back to ".data". This in turn
breaks DWARF generation which seems to need a linked binary so we link
data_source.o into data_source during the DWARF generation step.
@viktormalik
Copy link
Contributor Author

All comments should be addressed. Adding process_counts to data_source was quite a fight as pahole puts only percpu variables into BTF. So, I had to put the variable to the .data..percpu section and do some tweaks to building BTF and DWARF data in CMake (BTF needs relocatable file to preserve the custom section while DWARF needs executable file to be able to strip DWARF data into a separate file).

Copy link
Contributor

@jordalgo jordalgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work!

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay, great to finally have this!

@danobi danobi merged commit c44c1d9 into bpftrace:master Nov 22, 2024
18 checks passed
@viktormalik viktormalik deleted the percpu-kaddr branch December 11, 2024 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support accessing percpu kernel data
4 participants