Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hist: log2 histograms with finer granularity #2831

Merged
merged 1 commit into from
Dec 26, 2023

Conversation

luigirizzo
Copy link
Contributor

@luigirizzo luigirizzo commented Nov 17, 2023

Allow a second optional argument in hist(n, k) to map each power of 2 into 2^k buckets, thus creating a logarithmic scale with finer granularity and modest runtime overhead (a couple of shifts and add/mask in addition to the original algorithm).

Allowed values of k are 0..5, with 0 as default for backward compatibility.

The implementation follows my earlier code in https://github.com/luigirizzo/lr-cstats

Example below:

$ sudo src/bpftrace -e 'kfunc:tick_do_update_jiffies64 { @ = hist((nsecs & 0xff),2); }' Attaching 2 probes...
@:
[0]                    4 |@                                                   |
[1]                    1 |                                                    |
[2]                    3 |@                                                   |
[3]                    2 |                                                    |
[4]                    3 |@                                                   |
[5]                    0 |                                                    |
[6]                    3 |@                                                   |
[7]                    2 |                                                    |
[8, 10)                5 |@                                                   |
[10, 12)               7 |@@                                                  |
[12, 14)               5 |@                                                   |
[14, 16)               6 |@@                                                  |
[16, 20)              11 |@@@                                                 |
[20, 24)              14 |@@@@                                                |
[24, 28)              20 |@@@@@@                                              |
[28, 32)              13 |@@@@                                                |
[32, 40)              40 |@@@@@@@@@@@@@                                       |
[40, 48)              38 |@@@@@@@@@@@@@                                       |
[48, 56)              35 |@@@@@@@@@@@                                         |
[56, 64)              29 |@@@@@@@@@                                           |
[64, 80)              72 |@@@@@@@@@@@@@@@@@@@@@@@@                            |
[80, 96)              64 |@@@@@@@@@@@@@@@@@@@@@                               |
[96, 112)             61 |@@@@@@@@@@@@@@@@@@@@                                |
[112, 128)            67 |@@@@@@@@@@@@@@@@@@@@@@                              |
[128, 160)           124 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
[160, 192)           130 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        |
[192, 224)           124 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
[224, 256)           152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

Performance:
most of the cost of hist() in bpftrace is in the bpf hash lookup,
and the cost of index calculation is negligible.
One way to measure the overall cost is the following

sudo taskset -c 1 src/bpftrace -e 'i:us:1 { $t = nsecs; @A = hist($t); $t = nsecs - $t; @ = lhist($t, 0, 5000, 100);} '

and on my AMD 5800 most of the samples are in the 900-100us range;
my estimate for index computation (from lr-cstats) is in the 10-20ns range.

Remember to check /proc/sys/kernel/perf_ to make sure that the code
can generate a sufficient number of samples per second.

Checklist
  • Language changes are updated in man/adoc/bpftrace.adoc and if needed in docs/reference_guide.md
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look too close as the mathy bits yet. Mostly tried to leave some high level comments. This PR also needs tests (see https://github.com/iovisor/bpftrace/blob/master/docs/developers.md#tests for some hints)

Overall, this feature makes sense to me. I'm taking it on faith that the $$2^{k}$$ restriction (versus taking any old integer) is to make the bit twiddling work efficiently.

docs/reference_guide.md Show resolved Hide resolved
src/ast/passes/codegen_llvm.cpp Outdated Show resolved Hide resolved
src/ast/passes/semantic_analyser.cpp Show resolved Hide resolved
src/ast/passes/semantic_analyser.cpp Show resolved Hide resolved
@danobi
Copy link
Member

danobi commented Nov 18, 2023

CI failures look legit. There should be some hints about debugging those in the developers.md doc I linked as well

src/output.cpp Outdated Show resolved Hide resolved
@luigirizzo
Copy link
Contributor Author

CI failures look legit. There should be some hints about debugging those in the developers.md doc I linked as well

CI failures look legit. There should be some hints about debugging those in the developers.md doc I linked as well

I have fixed the json format ones and function prototype formatting.
The remaining failure should be in the code generation for hist() which I have not updated yet.

@luigirizzo
Copy link
Contributor Author

luigirizzo commented Nov 19, 2023

I didn't look too close as the mathy bits yet. Mostly tried to leave some high level comments. This PR also needs tests (see https://github.com/iovisor/bpftrace/blob/master/docs/developers.md#tests for some hints)

two questions here: (but these are orthogonal, so feel free to ignore them...)

  1. there seems to be no helper to compare the output of a named test with a file, so instead of
    NAME foo
    PROG ...
    EXPECT_FILE runtime/output/foo.xyz
    
    every test repeats the sequence
    RUN {{BPFTRACE}} args ... | python3 -c 'import sys,json; print(json.load(sys.stdin) == json.load(open("runtime/outputs/foo.xyz")))'
    EXPECT ^True$
    
    Is there any plan to add this kind of helpers in the runner engine ?
  2. unless I am mistaken I see no tests for the plain output format of hist/lhist etc. Shall I focus on json output only ?

@luigirizzo
Copy link
Contributor Author

I didn't look too close as the mathy bits yet. Mostly tried to leave some high level comments. This PR also needs tests (see https://github.com/iovisor/bpftrace/blob/master/docs/developers.md#tests for some hints)

Added a few tests in (semantic_analyser, call_hist) and one in runtime/json-output (histogram-finegrain), plus updated codegen for call_hist

@luigirizzo
Copy link
Contributor Author

CI failures look legit. There should be some hints about debugging those in the developers.md doc I linked as well

They should be fixed now.

docs/reference_guide.md Show resolved Hide resolved
tests/semantic_analyser.cpp Show resolved Hide resolved
src/ast/passes/semantic_analyser.cpp Show resolved Hide resolved
if (!check_varargs(call, 1, 2))
return;
if (call.vargs->size() == 1) {
call.vargs->push_back(new Integer(0, call.loc)); // default bits is 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be modifying the AST in semantic analysis. I see one other occurrence in semantic analysis where we inject a cast, but I think that's bad too. Ideally mutable passes should derive from Mutator visitor so it's explicit.

In this case, I think it'd be better to teach the other passes to use a default value of 0 where appropriate

src/output.cpp Outdated Show resolved Hide resolved
src/output.cpp Outdated Show resolved Hide resolved
src/ast/passes/codegen_llvm.cpp Outdated Show resolved Hide resolved
@danobi
Copy link
Member

danobi commented Nov 21, 2023

I didn't look too close as the mathy bits yet. Mostly tried to leave some high level comments. This PR also needs tests (see https://github.com/iovisor/bpftrace/blob/master/docs/developers.md#tests for some hints)

two questions here: (but these are orthogonal, so feel free to ignore them...)

1. there seems to be no helper to compare the output of a named test with a file, so instead of
   ```
   NAME foo
   PROG ...
   EXPECT_FILE runtime/output/foo.xyz
   ```
   every test repeats the sequence
   ```
   RUN {{BPFTRACE}} args ... | python3 -c 'import sys,json; print(json.load(sys.stdin) == json.load(open("runtime/outputs/foo.xyz")))'
   EXPECT ^True$
   ```        
   Is there any plan to add this kind of helpers in the runner engine ?

Yeah that would be a good addition. I filed #2841 . Feel free to take it if you're interested. Otherwise I will probably pick it up at some point

2. unless I am mistaken I see no tests for the plain output format of hist/lhist etc. Shall I focus on json output only ?

The json output happened to be easier to test. In the past we've done regexes to match the text output but that's messy and we often got it wrong. Might be useful in this case to check in some "gold" files and did a regular text compare. Might be useful to have an EXPECT_TEXT directive just like with the above suggested EXPECT_JSON directive.

Manual testing would be ok for now. I'll file a ticket for EXPECT_TEXT and refer to this PR.

@luigirizzo luigirizzo force-pushed the hist-finegrain branch 3 times, most recently from 03791d7 to 3d94abe Compare November 23, 2023 14:44
@luigirizzo
Copy link
Contributor Author

Allow a second optional argument in hist(n, k) to map each power of 2 into 2^k buckets, thus creating a logarithmic scale with finer granularity and modest runtime overhead (a couple of shifts and add/mask in addition to the original algorithm).

anything else left to do here?

@danobi
Copy link
Member

danobi commented Nov 24, 2023

Code needs a clang-format (https://github.com/iovisor/bpftrace/blob/master/docs/developers.md#code-style). Also would be good to use the new EXPECT_JSON for the runtime test.

Other than that, was waiting for enough coffee to hit the bloodstream to take a look at the algorithm again.

@ajor
Copy link
Member

ajor commented Nov 24, 2023

Thanks for the PR! Agree with @danobi that this feature looks reasonable. I just want to make sure before merging that it won't slow down histogram calculations for the default case (k=0).

The LLVM IR in the tests looks unoptimised so we'd probably need to look at the generated BPF bytecode to be sure. I can have a check next week if no one gets there before me.

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than the outstanding comments (clang-format, semantic analysis ast modification, stack allocation lifetime tracking, EXPECT_JSON, and Alastair's todo), LGTM.

I'm kinda on the fence about the semantic analysis ast modification thing now. Probably fine the way it is.

Comment on lines 3109 to +3126
Value *n_alloc = b_.CreateAllocaBPF(CreateUInt64());
b_.CreateStore(arg, n_alloc);
Value *result = b_.CreateAllocaBPF(CreateUInt64());
b_.CreateStore(b_.getInt64(0), result);
b_.CreateStore(log2_func->arg_begin(), n_alloc);
Value *k_alloc = b_.CreateAllocaBPF(CreateUInt64());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing b_.CreateLifetimeEnd for n_alloc and k_alloc

src/ast/passes/codegen_llvm.cpp Outdated Show resolved Hide resolved
@luigirizzo
Copy link
Contributor Author

Thanks for the PR! Agree with @danobi that this feature looks reasonable. I just want to make sure before merging that it won't slow down histogram calculations for the default case (k=0).

The LLVM IR in the tests looks unoptimised so we'd probably need to look at the generated BPF bytecode to be sure. I can have a check next week if no one gets there before me.

I don't know if you have any better way to measure index computation (other than modifying the code generator to collect timestamps around it) but I suspect most of the time is spent in looking up the bpf hash entry, rather than computing the index.

For measuring the time I use the following:

sudo taskset -c 1 /tmp/bpftrace0 -e 'i:us:1 { $t = nsecs; @a = hist($t); $t = nsecs - $t; @ = lhist($t, 0, 5000, 100);} ' 

and on my Beelink SER5 i see most entries in the 900,1000 slot with both the old and new code.

For reference, the same (native) code in github.com:luigirizzo/lr-cstats takes less than 20ns.

On the topic:

  • it may be preferable to hook the code on some event firing at high frequency but I don't have any good idea (perhaps an external process spinning on a syscall and hooking the call?)

  • the rate at which this generates events depends on /proc/sys/kernel/perf*, e.g. after installing a couple of perf-related packages I got the following

    /proc/sys/kernel/perf_cpu_time_max_percent:25
    /proc/sys/kernel/perf_event_max_contexts_per_stack:8
    /proc/sys/kernel/perf_event_max_sample_rate:39750 
    /proc/sys/kernel/perf_event_max_stack:127
    /proc/sys/kernel/perf_event_mlock_kb:516
    /proc/sys/kernel/perf_event_paranoid:-1
    

    and was stuck to some 6K events/s. Bumping perf_event_max_sample_rate to some 1 million I was able to run the above loop at most at 100K samples/s even with small intervals. /proc/self/timerslack did not seem to matter.

  • if you make the probes less frequent (say 1ms) and allow the CPU to go into some deep sleep state, you will see values more scattered (in my case bands are some 500ns apart, around 900, 1400, 1800, 2300, 2700) possibly due to various memory and tlb misses

Allow a second optional argument in hist(n, k) to map each power
of 2 into 2^k buckets, thus creating a logarithmic scale with finer
granularity and modest runtime overhead (a couple of shifts and add/mask
in addition to the original algorithm).

Allowed values of k are 0..5, with 0 as default for backward compatibility.

The implementation follows my earlier code in https://github.com/luigirizzo/lr-cstats

Example below:

$ sudo src/bpftrace -e 'kfunc:tick_do_update_jiffies64 { @ = hist((nsecs & 0xff),2); }'
Attaching 2 probes...
@:
[0]                    4 |@                                                   |
[1]                    1 |                                                    |
[2]                    3 |@                                                   |
[3]                    2 |                                                    |
[4]                    3 |@                                                   |
[5]                    0 |                                                    |
[6]                    3 |@                                                   |
[7]                    2 |                                                    |
[8, 10)                5 |@                                                   |
[10, 12)               7 |@@                                                  |
[12, 14)               5 |@                                                   |
[14, 16)               6 |@@                                                  |
[16, 20)              11 |@@@                                                 |
[20, 24)              14 |@@@@                                                |
[24, 28)              20 |@@@@@@                                              |
[28, 32)              13 |@@@@                                                |
[32, 40)              40 |@@@@@@@@@@@@@                                       |
[40, 48)              38 |@@@@@@@@@@@@@                                       |
[48, 56)              35 |@@@@@@@@@@@                                         |
[56, 64)              29 |@@@@@@@@@                                           |
[64, 80)              72 |@@@@@@@@@@@@@@@@@@@@@@@@                            |
[80, 96)              64 |@@@@@@@@@@@@@@@@@@@@@                               |
[96, 112)             61 |@@@@@@@@@@@@@@@@@@@@                                |
[112, 128)            67 |@@@@@@@@@@@@@@@@@@@@@@                              |
[128, 160)           124 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
[160, 192)           130 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        |
[192, 224)           124 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
[224, 256)           152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

Performance:
most of the cost of hist() in bpftrace is in the bpf hash lookup,
and the cost of index calculation is negligible.
One way to measure the overall cost is the following

sudo taskset -c 1 src/bpftrace -e 'i:us:1 { $t = nsecs; @A = hist($t); $t = nsecs - $t; @ = lhist($t, 0, 5000, 100);} '

and on my AMD 5800 most of the samples are in the 900-100us range;
my estimate for index computation (from lr-cstats) is in the 10-20ns range.

Remember to check `/proc/sys/kernel/perf_` to make sure that the code
can generate a sufficient number of samples per second.
@luigirizzo
Copy link
Contributor Author

other than the outstanding comments (clang-format, semantic analysis ast modification, stack allocation lifetime tracking, EXPECT_JSON, and Alastair's todo), LGTM.

I'm kinda on the fence about the semantic analysis ast modification thing now. Probably fine the way it is.

I think I am done with the modifications from my side (see comment on why I believe lifetime annotations are useless in this case).

@danobi
Copy link
Member

danobi commented Nov 29, 2023

I think I am done with the modifications from my side (see comment on why I believe lifetime annotations are useless in this case).

I don't see the comment anywhere - forgot to send?

@luigirizzo
Copy link
Contributor Author

I think I am done with the modifications from my side (see comment on why I believe lifetime annotations are useless in this case).

I don't see the comment anywhere - forgot to send?

This one on line 3109 for codegen_llvm.cpp ?

I have looked up the lifetime annotations https://llvm.org/docs/LangRef.html#object-lifetime and I believe that for this function it is much better to leave it to the optimizer.

As an experiment I also tried to access the two call arguments directly from the registers instead of allocating arguments on the stack:

Value *n = log2_func->arg_begin();
Value *k = log2_func->arg_begin() + 1;
(leaving it to the compiler whether or not to spill variables to the stack).
The unoptimized IR code is slightly better with these changes, but the optimized code is exactly the same in both cases.

@danobi
Copy link
Member

danobi commented Nov 29, 2023

Hmm, I see // l -= k; on line 3109.

But in any case, I think you're right. log2() is a separate function. We can leave it to compiler in this case.

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a build warning:

[21/86] Building CXX object src/CMakeFiles/runtime.dir/output.cpp.o
In file included from /usr/include/c++/13.2.1/cassert:44,
                 from /home/dxu/dev/bpftrace/src/log.h:3,
                 from /home/dxu/dev/bpftrace/src/output.cpp:3:
/home/dxu/dev/bpftrace/src/output.cpp: In static member function ‘static std::string bpftrace::TextOutput::hist_index_label(int, int)’:
/home/dxu/dev/bpftrace/src/output.cpp:53:16: warning: comparison of integer expressions of different signedness: ‘int’ and ‘const uint32_t’ {aka ‘const unsigned int’} [-Wsign-compare]
   53 |   assert(index >= n); // Smaller indexes are converted directly.

No big deal - I'll fix it up if we want to merge as is

@danobi
Copy link
Member

danobi commented Dec 1, 2023

@ajor did you get a chance to take a look? I suspect this will add a few more insns to the hot path. But ALU ops are quite cheap (basically free since it's all in registers) so I'm personally not too worried

@luigirizzo
Copy link
Contributor Author

@ajor did you get a chance to take a look? I suspect this will add a few more insns to the hot path. But ALU ops are quite cheap (basically free since it's all in registers) so I'm personally not too worried

FWIW (see other comments and updated description) I believe the hash lookup is probably 20-50 times more expensive than the index computation. This test

sudo taskset -c 1 src/bpftrace -e 'i:us:1 { $t = nsecs; @A = hist($t); $t = nsecs - $t; @ = lhist($t, 0, 5000, 100);} '

reports 900-1000ns for most samples with both old and new code.

In fact, it would be great if we could specialize the hist() and lhist() to use a bpf array instead of a map when there are no other keys.

Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. LLVM does optimise the code well and the only extra instructions in the default case are some extra bit twiddling for the higher bits:

  hist.is_not_zero.i:                               ; preds = %hist.is_not_less_than_zero.i
+   %a = icmp ugt i64 %input, 4294967295                                                                                                                                                                                                                                                                                                     
+   %b = select i1 %a, i64 32, i64 0                                                                                                                                                                                                                                                                                                         
+   %c = lshr i64 %input, %b                                                                                                                                                                                                                                                                                                                 
    %d = icmp ugt i64 %c, 65535                                                                                                                                                                                                                                                                                                              
    %e = select i1 %d, i64 16, i64 0
    %f = lshr i64 %c, %e                                                                                                                                                                                                                                                                                                                     
    %g = icmp ugt i64 %f, 255
    %h = select i1 %g, i64 8, i64 0
    %i = lshr i64 %f, %h
    %j = icmp ugt i64 %i, 15
    %k = select i1 %j, i64 4, i64 0
    %l = lshr i64 %i, %k
    %m = icmp ugt i64 %l, 3
    %n = select i1 %m, i64 2, i64 0
    %o = lshr i64 %l, %n
    %p = icmp ugt i64 %o, 1
    %q = zext i1 %p to i64
+   %r = or i64 %b, %e                                                                                                                                                                                                                                                                                                                       
    %s = or i64 %r, %h                                                                                                                                                                                                                                                                                                                       
    %t = or i64 %s, %k
    %u = or i64 %t, 2
    %v = add nuw nsw i64 %u, %n
    %w = or i64 %v, %q
    br label %log2.exit

So no major performance penalty. Definitely agree that we should switch histograms over to array maps in the future!

What's the reason that this algorithm deals with more bits now though?

Comment on lines +138 to +143
int old, bits = static_cast<Integer *>(call.vargs->at(1))->n;
if (r.find(call.map->ident) != r.end() &&
(old = r[call.map->ident]) != bits)
{
LOG(ERROR, call.loc, err_)
<< "Different bits in a single hist, had " << old << " now " << bits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int old, bits = static_cast<Integer *>(call.vargs->at(1))->n;
if (r.find(call.map->ident) != r.end() &&
(old = r[call.map->ident]) != bits)
{
LOG(ERROR, call.loc, err_)
<< "Different bits in a single hist, had " << old << " now " << bits;
int bits = static_cast<Integer *>(call.vargs->at(1))->n;
if (r.find(call.map->ident) != r.end() &&
(r[call.map->ident]) != bits)
{
LOG(ERROR, call.loc, err_)
<< "Different bits in a single hist, had " << r[call.map->ident] << " now " << bits;

The multiple declarations and assignment/comparision in single lines feel too clever for me. Since this is just an error path there's no need to optimise it anyway, so let's just read the map twice to simplify the code.

@luigirizzo
Copy link
Contributor Author

What's the reason that this algorithm deals with more bits now though?

Do you mean what is the motivation ? Power of 2 is too coarse to evaluate e.g. changes that cause less than 2x performance differences. With the finer granularity we become able to see down to 3-5% without having to compromise on the dynamic range (e.g. we can differentiate 1000ns and 1100ns and still identify tails in the ms range)

@danobi
Copy link
Member

danobi commented Dec 15, 2023

I mentioned in commit msg in 006e68d that we were underutilizing the u64 by a lot (~54 bits unused IIUC). So yes, this uses more bits. Not sure if that addresses the question.

@danobi
Copy link
Member

danobi commented Dec 15, 2023

I'd prefer if we merge this soon-ish rather than let it drag out and potentially eat merge conflicts. I can follow up with the request readability fix.

I'll push the button sometime this weekend if there are no more comments.

@danobi danobi merged commit 1c5fac1 into bpftrace:master Dec 26, 2023
18 checks passed
danobi added a commit to danobi/bpftrace that referenced this pull request Dec 26, 2023
Fixes up some follow-on comments from [0].

Also fixes a compile time warning:

[25/85] Building CXX object src/CMakeFiles/runtime.dir/output.cpp.o
In file included from /usr/include/c++/13.2.1/cassert:44,
                 from /home/dxu/dev/bpftrace/src/log.h:3,
                 from /home/dxu/dev/bpftrace/src/output.cpp:3:
/home/dxu/dev/bpftrace/src/output.cpp: In static member function ‘static std::string bpftrace::TextOutput::hist_index_label(int, int)’:
/home/dxu/dev/bpftrace/src/output.cpp:53:16: warning: comparison of integer expressions of different signedness: ‘int’ and ‘const uint32_t’ {aka ‘const unsigned int’} [-Wsign-compare]
   53 |   assert(index >= n); // Smaller indexes are converted directly.
      |          ~~~~~~^~~~

[0]: bpftrace#2831 (comment)
@danobi danobi mentioned this pull request Dec 26, 2023
3 tasks
danobi added a commit to danobi/bpftrace that referenced this pull request Dec 26, 2023
Fixes up some follow-on comments from [0].

Also fixes a compile time warning:

```
[25/85] Building CXX object src/CMakeFiles/runtime.dir/output.cpp.o
In file included from /usr/include/c++/13.2.1/cassert:44,
                 from /home/dxu/dev/bpftrace/src/log.h:3,
                 from /home/dxu/dev/bpftrace/src/output.cpp:3:
/home/dxu/dev/bpftrace/src/output.cpp: In static member function ‘static std::string bpftrace::TextOutput::hist_index_label(int, int)’:
/home/dxu/dev/bpftrace/src/output.cpp:53:16: warning: comparison of integer expressions of different signedness: ‘int’ and ‘const uint32_t’ {aka ‘const unsigned int’} [-Wsign-compare]
   53 |   assert(index >= n); // Smaller indexes are converted directly.
      |          ~~~~~~^~~~
```

[0]: bpftrace#2831 (comment)
danobi added a commit to danobi/bpftrace that referenced this pull request Jan 6, 2024
Fixes up some follow-on comments from [0].

Also fixes a compile time warning:

```
[25/85] Building CXX object src/CMakeFiles/runtime.dir/output.cpp.o
In file included from /usr/include/c++/13.2.1/cassert:44,
                 from /home/dxu/dev/bpftrace/src/log.h:3,
                 from /home/dxu/dev/bpftrace/src/output.cpp:3:
/home/dxu/dev/bpftrace/src/output.cpp: In static member function ‘static std::string bpftrace::TextOutput::hist_index_label(int, int)’:
/home/dxu/dev/bpftrace/src/output.cpp:53:16: warning: comparison of integer expressions of different signedness: ‘int’ and ‘const uint32_t’ {aka ‘const unsigned int’} [-Wsign-compare]
   53 |   assert(index >= n); // Smaller indexes are converted directly.
      |          ~~~~~~^~~~
```

[0]: bpftrace#2831 (comment)
viktormalik pushed a commit that referenced this pull request Jan 8, 2024
Fixes up some follow-on comments from [0].

Also fixes a compile time warning:

```
[25/85] Building CXX object src/CMakeFiles/runtime.dir/output.cpp.o
In file included from /usr/include/c++/13.2.1/cassert:44,
                 from /home/dxu/dev/bpftrace/src/log.h:3,
                 from /home/dxu/dev/bpftrace/src/output.cpp:3:
/home/dxu/dev/bpftrace/src/output.cpp: In static member function ‘static std::string bpftrace::TextOutput::hist_index_label(int, int)’:
/home/dxu/dev/bpftrace/src/output.cpp:53:16: warning: comparison of integer expressions of different signedness: ‘int’ and ‘const uint32_t’ {aka ‘const unsigned int’} [-Wsign-compare]
   53 |   assert(index >= n); // Smaller indexes are converted directly.
      |          ~~~~~~^~~~
```

[0]: #2831 (comment)
@janca-ucdavis
Copy link

janca-ucdavis commented May 30, 2024

Hi - I just found out about this feature. I've used partial-power log tables for many years to get fine grained histograms, and I see something here that seems relevant. I thought about filing a bug but this seems like more of a design decision.

The buckets here don't appear to be log-sized. Rather, they're fixed width within each 2^x - 2^(x+1) set. This gives "sizes" that change abruptly at the border: four buckets that are 32-wide, then four that are 64-wide, and so on (32, 32, 32, 32, 64).

Basically, what's the rationale behind the decision to bucket them like this and not like as described in #1120? If a fast bitwise partial-power log is needed, I have one that's polished I'm happy to offer - we use it here inside of bpftrace probes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants