-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert hsb_to_rgb to use Integers Instead of Floats to Improve Performance #926
base: main
Are you sure you want to change the base?
Conversation
Some values with new formatting are off by 1
Without benchmarking, it's impossible to say if this code is faster. It looks like you're doing more integer operations than the old code did float operations, and it's also possible the compiler is using very large integer types to avoid overflow which might not perform as well. Floating point math on ARM processors is also probably faster than you'd expect if you're used to AVR, though my experience is mostly with more powerful ARM SoCs that can run Linux, so maybe that isn't true of lower power ones. I couldn't find any built-in utilities for general benchmarking in Zephyr, but it's pretty easy to write your own benchmarking code, for example: // Taken from https://github.com/google/benchmark
#define DO_NOT_OPTIMIZE(value) \
asm volatile("" : : "r,m"(value) : "memory");
// Increase this number until each benchmark takes long enough to get good data
#define ITERATIONS 1000
void benchmark_old(void) {
const int64_t start_ticks = k_uptime_ticks();
for (int i = 0; i < ITERATIONS; i++) {
for (uint16_t hue = 0; hue < 360; hue++) {
for (uint8_t sat = 0; sat < 101; sat++) {
for (uint8_t bri = 0; bri < 101; bri++) {
struct zmk_led_hsb hsb = {hue, sat, bri};
struct led_rgb rgb = hsb_to_rgb_old(hsb);
DO_NOT_OPTIMIZE(rgb);
}
}
}
}
const int64_t elapsed = k_uptime_ticks() - start_ticks;
LOG_INF("Old function: %d ticks", elapsed);
}
void benchmark_new(void) {
const int64_t start_ticks = k_uptime_ticks();
// same loop but with hsb_to_rgb_new()
const int64_t elapsed = k_uptime_ticks() - start_ticks;
LOG_INF("New function: %d ticks", elapsed);
}
// Ideally you run this in a standalone app that isn't doing any interrupt handling,
// but if that's too hard to set up, just run it several times and make sure you're
// getting consistent numbers.
void run_benchmark(void) {
benchmark_old();
benchmark_new();
} If you do run a benchmark, make sure you have Maybe even run it with and without the FPU enabled to see how much of a difference that makes. In the (probably unlikely) scenario that we support some boards that don't have an FPU, the float version is faster when the FPU is enabled but the integer version is faster when disabled, and we expect this function to get called frequently in some future RGB animation code, then we might actually want both versions of the function so we can pick the faster one based on |
I like that idea for sure. I'm not sure I can manage a standalone app unless there's a template I could work from. Maybe I could set up a branch where the benchmark is run every time I reset the board? I'm a little concerned that running the benchmark at startup could be different than running it after the board's been going a while. Maybe I'm overthinking it though. |
You could maybe use a delayed work queue item to run the benchmark after a delay so it isn't affected by other init code. |
Hello! I've implemented this change suggested in #65 (RGB Underglow Improvements). As the title suggests, this converts the function hsb_to_rgb to use Integers instead of Floats in the hope that it improves performance. Overall, I think the change is mostly self-explanatory, but there are a few points I want to dive into.
There is a slight increase of complexity in this version. In the previous version, the variable v could be reused in the calculation of p, q, and t. Due to rounding issues introduced by integers, this is no longer possible as the division by BRT_MAX must be performed after all of the multiplication is performed. While this step isn't strictly necessary, it prevents a large number of inconvenient off-by-one rounding errors that would otherwise be introduced.
On the topic of rounding errors, even with my best efforts, a few did slip through the cracks. That said, there are very few. I wrote a script to compare all values produced with the old function to the values produced with the new function. Here are all of the differences:
The most any single result differs by is only 1, so the impact is pretty small. That said, I wasn't able to find a way around these. Still, of the 3,672,360 possible combinations, only these 7 were different. (See the footnote at the bottom for the script I used to generate this.)
That brings me to another point that may be a bit weird. I said 3,672,360 combinations instead of 3,600,000 which might be what you expect. This is because I realized that in the current implementation, saturation and brightness take the range of 0 to 100 inclusive, meaning that there are 101 possible values for each of those. Don't know if it's desirable to change this, but I decided to leave it be for now.
I was a little concerned about overflow with the large multiplications happening, but that doesn't seem like an issue when I ran the comparison script.
Lastly, I cannot confirm if this actually improves the performance. Odds are, that will be somewhat dependent on the hardware used, but I can't imagine floating point math being more efficient that often. Still, it probably does mean some testing would be in order to determine that for sure.
And that's pretty much it. If you have any feedback, please let me know. Thanks!
Footnote: Code used to compare old and new algorithms (hsb_to_rgb_old and hsb_to_rgb_new are not included here)