Skip to content

Commit

Permalink
Introduce builtin to get percpu kernel data
Browse files Browse the repository at this point in the history
There exist "percpu" global variables in the kernel which contain a
distinct value for each CPU. In BPF, access to these variables is done
via the bpf_per_cpu_ptr and bpf_this_cpu_ptr helpers. Both accept a
pointer to a percpu ksym and the former also accepts a CPU number.  The
ksym is an extern variable with a BTF entry matching the BTF of the
corresponding kernel variable. Since we now use libbpf to do the
loading, it is sufficient to emit a global variable declaration with a
proper BTF and libbpf will take care of the rest.

Introduce new bpftrace builtin percpu_kaddr to access the percpu data.
The helper has two forms:

    percpu_kaddr("symbol_name")      <- uses bpf_this_cpu_ptr
    percpu_kaddr("symbol_name", N)   <- uses bpf_per_cpu_ptr

where N is the CPU number. The former variant retrieves the value for
the current CPU.

A tricky part is that bpf_per_cpu_ptr may return NULL if the supplied
CPU number is higher than the number of the CPUs. The BPF program should
perform a NULL-check on the returned value, otherwise it is rejected by
the verifier. In practice, this only happens if pointer arithmetics is
used (i.e. a struct field is accessed). Since it is quite complex to
detect a missing NULL-check in bpftrace, we instead let verifier do it
and just display the potential verifier error in a nicer manner.

The check if the global variable exists is done in semantic analyser to
get better error highlighting. Therefore, for testing the new builtin in
semantic analyser tests, we need to add a symbol (process_counts) into
tests/data/data_source.c to get it into our mock BTF. The problem here
is that pahole places only percpu variables into BTF (and none other) so
the symbol must be in the ".data..percpu" section. To do that, we need
to make data_source.o a relocatable file (using compiler's -c option),
otherwise the linker would put the symbol back to ".data". This in turn
breaks DWARF generation which seems to need a linked binary so we link
data_source.o into data_source during the DWARF generation step.
  • Loading branch information
viktormalik committed Nov 22, 2024
1 parent 14f1e5c commit efb61f6
Show file tree
Hide file tree
Showing 19 changed files with 373 additions and 8 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ and this project adheres to
- [#3547](https://github.com/bpftrace/bpftrace/pull/3547)
- Add `symbol_source` config to source uprobe locations from either DWARF or the Symbol Table
- [#3504](https://github.com/bpftrace/bpftrace/pull/3504/)
- Introduce builtin to access percpu kernel data
- [#3596](https://github.com/bpftrace/bpftrace/pull/3596/)
#### Changed
- Merge output into `stdout` when `-lv`
- [#3383](https://github.com/bpftrace/bpftrace/pull/3383)
Expand Down
37 changes: 37 additions & 0 deletions man/adoc/bpftrace.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1364,6 +1364,10 @@ Tracing block I/O sizes > 0 bytes
| Return full path
| Sync

| <<functions-percpu-kaddr, `percpu_kaddr(const string name [, int cpu])`>>
| Resolve percpu kernel symbol name
| Sync

| <<functions-print, `print(...)`>>
| Print a non-map value with default formatting
| Async
Expand Down Expand Up @@ -1846,6 +1850,39 @@ the path will be clamped by `size` otherwise `BPFTRACE_MAX_STRLEN` is used.

This function can only be used by functions that are allowed to, these functions are contained in the `btf_allowlist_d_path` set in the kernel.

[#functions-percpu-kaddr]
=== percpu_kaddr

.variants
* `void *percpu_kaddr(const string name)`
* `void *percpu_kaddr(const string name, int cpu)`

*sync*

Get the address of the percpu kernel symbol `name` for CPU `cpu`. When `cpu` is
omitted, the current CPU is used.

----
interval:s:1 {
$proc_cnt = percpu_kaddr("process_counts");
printf("% processes are running on CPU %d\n", *$proc_cnt, cpu);
}
----

The second variant may return NULL if `cpu` is higher than the number of
available CPUs. Therefore, it is necessary to perform a NULL-check on the result
when accessing fields of the pointed structure, otherwise the BPF program will
be rejected.

----
interval:s:1 {
$runqueues = (struct rq *)percpu_kaddr("runqueues", 0);
if ($runqueues != 0) { // The check is mandatory here
print($runqueues->nr_running);
}
}
----

[#functions-print]
=== print

Expand Down
10 changes: 6 additions & 4 deletions src/ast/dibuilderbpf.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -213,10 +213,12 @@ DIType *DIBuilderBPF::GetType(const SizedType &stype, bool emit_codegen_types)
{
if (!emit_codegen_types && stype.IsRecordTy()) {
std::string name = stype.GetName();
if (name.find("struct ") == 0)
name = name.substr(std::string("struct ").length());
else if (name.find("union ") == 0)
name = name.substr(std::string("union ").length());
static constexpr std::string struct_prefix = "struct ";
static constexpr std::string union_prefix = "union ";
if (name.find(struct_prefix) == 0)
name = name.substr(struct_prefix.length());
else if (name.find(union_prefix) == 0)
name = name.substr(union_prefix.length());

return createStructType(file,
name,
Expand Down
33 changes: 33 additions & 0 deletions src/ast/irbuilderbpf.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2141,6 +2141,39 @@ CallInst *IRBuilderBPF::CreateGetFuncIp(Value *ctx, const location &loc)
&loc);
}

CallInst *IRBuilderBPF::CreatePerCpuPtr(Value *var,
Value *cpu,
const location &loc)
{
// void *bpf_per_cpu_ptr(const void *percpu_ptr, u32 cpu)
// Return:
// A pointer pointing to the kernel percpu variable on
// cpu, or NULL, if cpu is invalid.
FunctionType *percpuptr_func_type = FunctionType::get(
GET_PTR_TY(), { GET_PTR_TY(), getInt64Ty() }, false);
return CreateHelperCall(libbpf::BPF_FUNC_per_cpu_ptr,
percpuptr_func_type,
{ var, cpu },
"per_cpu_ptr",
&loc);
}

CallInst *IRBuilderBPF::CreateThisCpuPtr(Value *var, const location &loc)
{
// void *bpf_per_cpu_ptr(const void *percpu_ptr)
// Return:
// A pointer pointing to the kernel percpu variable on
// this cpu. May never be NULL.
FunctionType *percpuptr_func_type = FunctionType::get(GET_PTR_TY(),
{ GET_PTR_TY() },
false);
return CreateHelperCall(libbpf::BPF_FUNC_this_cpu_ptr,
percpuptr_func_type,
{ var },
"this_cpu_ptr",
&loc);
}

void IRBuilderBPF::CreateGetCurrentComm(Value *ctx,
AllocaInst *buf,
size_t size,
Expand Down
2 changes: 2 additions & 0 deletions src/ast/irbuilderbpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,8 @@ class IRBuilderBPF : public IRBuilder<> {
StackType stack_type,
const location &loc);
CallInst *CreateGetFuncIp(Value *ctx, const location &loc);
CallInst *CreatePerCpuPtr(Value *var, Value *cpu, const location &loc);
CallInst *CreateThisCpuPtr(Value *var, const location &loc);
CallInst *CreateGetJoinMap(BasicBlock *failure_callback, const location &loc);
CallInst *CreateGetStackScratchMap(StackType stack_type,
BasicBlock *failure_callback,
Expand Down
42 changes: 42 additions & 0 deletions src/ast/passes/codegen_llvm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -927,6 +927,18 @@ void CodegenLLVM::visit(Call &call)
if (!addr)
throw FatalUserException("Failed to resolve kernel symbol: " + name);
expr_ = b_.getInt64(addr);
} else if (call.func == "percpu_kaddr") {
auto name = bpftrace_.get_string_literal(call.vargs.at(0));
auto var = b_.CreatePointerCast(DeclareKernelVar(name), b_.GET_PTR_TY());
Value *percpu_ptr;
if (call.vargs.size() == 1) {
percpu_ptr = b_.CreateThisCpuPtr(var, call.loc);
} else {
auto scoped_del = accept(call.vargs.at(1));
Value *cpu = expr_;
percpu_ptr = b_.CreatePerCpuPtr(var, cpu, call.loc);
}
expr_ = b_.CreatePtrToInt(percpu_ptr, b_.getInt64Ty());
} else if (call.func == "uaddr") {
auto name = bpftrace_.get_string_literal(call.vargs.at(0));
struct symbol sym = {};
Expand Down Expand Up @@ -4697,4 +4709,34 @@ CallInst *CodegenLLVM::CreateKernelFuncCall(Kfunc kfunc,
return b_.createCall(func->getFunctionType(), func, args, name);
}

/// This should emit
///
/// declare !dbg !... extern ... @var_name(...) section ".ksyms"
///
/// with proper debug info entry.
///
/// The function type is retrieved from kernel BTF.
///
/// If the function declaration is already in the module, just return it.
///
GlobalVariable *CodegenLLVM::DeclareKernelVar(const std::string &var_name)
{
if (auto *sym = module_->getGlobalVariable(var_name))
return sym;

std::string err;
auto type = bpftrace_.btf_->get_var_type(var_name);
assert(!type.IsNoneTy()); // already checked in semantic analyser

auto var = llvm::dyn_cast<GlobalVariable>(
module_->getOrInsertGlobal(var_name, b_.GetType(type)));
var->setSection(".ksyms");
var->setLinkage(llvm::GlobalValue::ExternalLinkage);

auto var_debug = debug_.createGlobalVariable(var_name, type);
var->addDebugInfo(var_debug);

return var;
}

} // namespace bpftrace::ast
2 changes: 2 additions & 0 deletions src/ast/passes/codegen_llvm.h
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,8 @@ class CodegenLLVM : public Visitor {
ArrayRef<Value *> args,
const Twine &name);

GlobalVariable *DeclareKernelVar(const std::string &name);

Node *root_ = nullptr;

BPFtrace &bpftrace_;
Expand Down
14 changes: 14 additions & 0 deletions src/ast/passes/semantic_analyser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1102,6 +1102,20 @@ void SemanticAnalyser::visit(Call &call)
}
call.type = CreateUInt64();
call.type.SetAS(AddrSpace::kernel);
} else if (call.func == "percpu_kaddr") {
if (check_varargs(call, 1, 2)) {
check_arg(call, Type::string, 0, true);
if (call.vargs.size() == 2)
check_arg(call, Type::integer, 1, false);

auto symbol = bpftrace_.get_string_literal(call.vargs.at(0));
if (bpftrace_.btf_->get_var_type(symbol).IsNoneTy()) {
LOG(ERROR, call.loc, err_)
<< "Could not resolve variable \"" << symbol << "\" from BTF";
}
}
call.type = CreateUInt64();
call.type.SetAS(AddrSpace::kernel);
} else if (call.func == "uaddr") {
auto probe = get_probe(call.loc, call.func);
if (probe == nullptr)
Expand Down
6 changes: 5 additions & 1 deletion src/bpfbytecode.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -220,11 +220,15 @@ void BpfBytecode::load_progs(const RequiredResources &resources,
// failures when the verifier log is non-empty.
std::string_view log(log_bufs[name].data());
if (!log.empty()) {
// This should be the only error that may occur here and does not imply
// These should be the only errors that may occur here which do not imply
// a bpftrace bug so throw immediately with a proper error message.
maybe_throw_helper_verifier_error(log,
"helper call is not allowed in probe",
" not allowed in probe");
maybe_throw_helper_verifier_error(
log,
"pointer arithmetic on ptr_or_null_ prohibited, null-check it first",
": result needs to be null-checked before accessing fields");

std::stringstream errmsg;
errmsg << "Error loading BPF program for " << name << ".";
Expand Down
13 changes: 13 additions & 0 deletions src/btf.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -875,4 +875,17 @@ SizedType BTF::get_stype(const std::string &type_name)
return CreateNone();
}

SizedType BTF::get_var_type(const std::string &var_name)
{
auto var_id = find_id(var_name, BTF_KIND_VAR);
if (!var_id.btf)
return CreateNone();

const struct btf_type *t = btf__type_by_id(var_id.btf, var_id.id);
if (!t)
return CreateNone();

return get_stype(BTFId{ .btf = var_id.btf, .id = t->type });
}

} // namespace bpftrace
1 change: 1 addition & 0 deletions src/btf.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ class BTF {
std::string type_of(const std::string& name, const std::string& field);
std::string type_of(const BTFId& type_id, const std::string& field);
SizedType get_stype(const std::string& type_name);
SizedType get_var_type(const std::string& var_name);

std::set<std::string> get_all_structs() const;
std::unique_ptr<std::istream> get_all_funcs() const;
Expand Down
2 changes: 1 addition & 1 deletion src/lexer.l
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ vspace [\n\r]
space {hspace}|{vspace}
path :(\\.|[_\-\./a-zA-Z0-9#+\*])+
builtin arg[0-9]|args|cgroup|comm|cpid|numaid|cpu|ctx|curtask|elapsed|func|gid|pid|probe|rand|retval|sarg[0-9]|tid|uid|username|jiffies
call avg|buf|cat|cgroupid|clear|count|delete|exit|hist|join|kaddr|kptr|ksym|len|lhist|macaddr|max|min|ntop|override|print|printf|cgroup_path|reg|signal|stats|str|strerror|strftime|strncmp|strcontains|sum|system|time|uaddr|uptr|usym|zero|path|unwatch|bswap|skboutput|pton|debugf|has_key
call avg|buf|cat|cgroupid|clear|count|delete|exit|hist|join|kaddr|kptr|ksym|len|lhist|macaddr|max|min|ntop|override|print|printf|cgroup_path|reg|signal|stats|str|strerror|strftime|strncmp|strcontains|sum|system|time|uaddr|uptr|usym|zero|path|unwatch|bswap|skboutput|pton|debugf|has_key|percpu_kaddr

int_type bool|(u)?int(8|16|32|64)
builtin_type void|(u)?(min|max|sum|count|avg|stats)_t|probe_t|username|lhist_t|hist_t|usym_t|ksym_t|timestamp|macaddr_t|cgroup_path_t|strerror_t|kstack_t|ustack_t
Expand Down
19 changes: 19 additions & 0 deletions tests/codegen/call_percpu_kaddr.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#include "common.h"

namespace bpftrace {
namespace test {
namespace codegen {

TEST(codegen, call_percpu_kaddr)
{
test("BEGIN { percpu_kaddr(\"process_counts\", 0); }", NAME);
}

TEST(codegen, call_percpu_kaddr_this_cpu)
{
test("BEGIN { percpu_kaddr(\"process_counts\"); }", NAME);
}

} // namespace codegen
} // namespace test
} // namespace bpftrace
76 changes: 76 additions & 0 deletions tests/codegen/llvm/call_percpu_kaddr.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
; ModuleID = 'bpftrace'
source_filename = "bpftrace"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "bpf-pc-linux"

%"struct map_t" = type { ptr, ptr }
%"struct map_t.0" = type { ptr, ptr, ptr, ptr }

@LICENSE = global [4 x i8] c"GPL\00", section "license"
@ringbuf = dso_local global %"struct map_t" zeroinitializer, section ".maps", !dbg !0
@event_loss_counter = dso_local global %"struct map_t.0" zeroinitializer, section ".maps", !dbg !16
@process_counts = external global i64, section ".ksyms", !dbg !36

; Function Attrs: nounwind
declare i64 @llvm.bpf.pseudo(i64 %0, i64 %1) #0

define i64 @BEGIN_1(ptr %0) section "s_BEGIN_1" !dbg !41 {
entry:
%per_cpu_ptr = call ptr inttoptr (i64 153 to ptr)(ptr @process_counts, i64 0)
%1 = ptrtoint ptr %per_cpu_ptr to i64
ret i64 0
}

attributes #0 = { nounwind }

!llvm.dbg.cu = !{!38}
!llvm.module.flags = !{!40}

!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(name: "ringbuf", linkageName: "global", scope: !2, file: !2, type: !3, isLocal: false, isDefinition: true)
!2 = !DIFile(filename: "bpftrace.bpf.o", directory: ".")
!3 = !DICompositeType(tag: DW_TAG_structure_type, scope: !2, file: !2, size: 128, elements: !4)
!4 = !{!5, !11}
!5 = !DIDerivedType(tag: DW_TAG_member, name: "type", scope: !2, file: !2, baseType: !6, size: 64)
!6 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !7, size: 64)
!7 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, size: 864, elements: !9)
!8 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!9 = !{!10}
!10 = !DISubrange(count: 27, lowerBound: 0)
!11 = !DIDerivedType(tag: DW_TAG_member, name: "max_entries", scope: !2, file: !2, baseType: !12, size: 64, offset: 64)
!12 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !13, size: 64)
!13 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, size: 8388608, elements: !14)
!14 = !{!15}
!15 = !DISubrange(count: 262144, lowerBound: 0)
!16 = !DIGlobalVariableExpression(var: !17, expr: !DIExpression())
!17 = distinct !DIGlobalVariable(name: "event_loss_counter", linkageName: "global", scope: !2, file: !2, type: !18, isLocal: false, isDefinition: true)
!18 = !DICompositeType(tag: DW_TAG_structure_type, scope: !2, file: !2, size: 256, elements: !19)
!19 = !{!20, !25, !30, !33}
!20 = !DIDerivedType(tag: DW_TAG_member, name: "type", scope: !2, file: !2, baseType: !21, size: 64)
!21 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !22, size: 64)
!22 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, size: 64, elements: !23)
!23 = !{!24}
!24 = !DISubrange(count: 2, lowerBound: 0)
!25 = !DIDerivedType(tag: DW_TAG_member, name: "max_entries", scope: !2, file: !2, baseType: !26, size: 64, offset: 64)
!26 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !27, size: 64)
!27 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, size: 32, elements: !28)
!28 = !{!29}
!29 = !DISubrange(count: 1, lowerBound: 0)
!30 = !DIDerivedType(tag: DW_TAG_member, name: "key", scope: !2, file: !2, baseType: !31, size: 64, offset: 128)
!31 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !32, size: 64)
!32 = !DIBasicType(name: "int32", size: 32, encoding: DW_ATE_signed)
!33 = !DIDerivedType(tag: DW_TAG_member, name: "value", scope: !2, file: !2, baseType: !34, size: 64, offset: 192)
!34 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !35, size: 64)
!35 = !DIBasicType(name: "int64", size: 64, encoding: DW_ATE_signed)
!36 = !DIGlobalVariableExpression(var: !37, expr: !DIExpression())
!37 = distinct !DIGlobalVariable(name: "process_counts", linkageName: "global", scope: !2, file: !2, type: !35, isLocal: false, isDefinition: true)
!38 = distinct !DICompileUnit(language: DW_LANG_C, file: !2, producer: "bpftrace", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly, globals: !39)
!39 = !{!0, !16, !36}
!40 = !{i32 2, !"Debug Info Version", i32 3}
!41 = distinct !DISubprogram(name: "BEGIN_1", linkageName: "BEGIN_1", scope: !2, file: !2, type: !42, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !38, retainedNodes: !46)
!42 = !DISubroutineType(types: !43)
!43 = !{!35, !44}
!44 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !45, size: 64)
!45 = !DIBasicType(name: "int8", size: 8, encoding: DW_ATE_signed)
!46 = !{!47}
!47 = !DILocalVariable(name: "ctx", arg: 1, scope: !41, file: !2, type: !44)
Loading

0 comments on commit efb61f6

Please sign in to comment.