Convert hsb_to_rgb to use Integers Instead of Floats to Improve Performance #926

coltontcrowe · 2021-08-29T10:14:02Z

Hello! I've implemented this change suggested in #65 (RGB Underglow Improvements). As the title suggests, this converts the function hsb_to_rgb to use Integers instead of Floats in the hope that it improves performance. Overall, I think the change is mostly self-explanatory, but there are a few points I want to dive into.

There is a slight increase of complexity in this version. In the previous version, the variable v could be reused in the calculation of p, q, and t. Due to rounding issues introduced by integers, this is no longer possible as the division by BRT_MAX must be performed after all of the multiplication is performed. While this step isn't strictly necessary, it prevents a large number of inconvenient off-by-one rounding errors that would otherwise be introduced.
On the topic of rounding errors, even with my best efforts, a few did slip through the cracks. That said, there are very few. I wrote a script to compare all values produced with the old function to the values produced with the new function. Here are all of the differences:

H	S	B	Old R	New R	Equal	Old G	New G	Equal	Old B	New B	Equal
213	099	055	002	002	✔️	064	063	❌	138	138	✔️
221	067	071	060	060	✔️	098	097	❌	179	179	✔️
239	061	079	078	078	✔️	081	080	❌	199	199	✔️
239	083	097	043	043	✔️	047	046	❌	244	244	✔️
261	075	049	064	063	❌	031	031	✔️	123	123	✔️
261	075	098	128	127	❌	063	063	✔️	247	247	✔️
327	099	090	227	227	✔️	004	004	✔️	126	127	❌

The most any single result differs by is only 1, so the impact is pretty small. That said, I wasn't able to find a way around these. Still, of the 3,672,360 possible combinations, only these 7 were different. (See the footnote at the bottom for the script I used to generate this.)

That brings me to another point that may be a bit weird. I said 3,672,360 combinations instead of 3,600,000 which might be what you expect. This is because I realized that in the current implementation, saturation and brightness take the range of 0 to 100 inclusive, meaning that there are 101 possible values for each of those. Don't know if it's desirable to change this, but I decided to leave it be for now.
I was a little concerned about overflow with the large multiplications happening, but that doesn't seem like an issue when I ran the comparison script.
Lastly, I cannot confirm if this actually improves the performance. Odds are, that will be somewhat dependent on the hardware used, but I can't imagine floating point math being more efficient that often. Still, it probably does mean some testing would be in order to determine that for sure.

And that's pretty much it. If you have any feedback, please let me know. Thanks!

Footnote: Code used to compare old and new algorithms (hsb_to_rgb_old and hsb_to_rgb_new are not included here)

#include <sys/param.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

#define HUE_MAX 360
#define SAT_MAX 101
#define BRT_MAX 101
#define RGB_MAX 255
#define NUM_SEG 6
#define DEG_SEG (HUE_MAX / NUM_SEG)

struct led_rgb {
    uint8_t r;
    uint8_t g;
    uint8_t b;
};

struct zmk_led_hsb {
    uint16_t h;
    uint8_t s;
    uint8_t b;
};

// DECLARATION
static struct led_rgb hsb_to_rgb_old(struct zmk_led_hsb hsb);
static struct led_rgb hsb_to_rgb_new(struct zmk_led_hsb hsb);

// DEFINITION
int main()
{
    FILE *outfile = fopen("output.md", "w");
    if (!outfile) {
        fprintf(stderr, "unable to open file for printing\n\n");
    }

    fprintf(outfile, "| H | S | B | | Old R | New R | Equal | | Old G | New G | Equal | | Old B | New B| Equal |\n");
    fprintf(outfile, "|---|---|---|-|-------|-------|-------|-|-------|-------|-------|-|-------|------|-------|\n");
    uint32_t num_total = 0;
    uint32_t num_correct = 0;
    uint8_t largest_err = 0;


    for (uint16_t hue = 0; hue < 360; hue+=1) {
        for (uint8_t sat = 0; sat < 101; sat+=1) {
            for (uint8_t bri = 0; bri < 101; bri+=1) {
                struct zmk_led_hsb in_hsb = {hue, sat, bri};

                struct led_rgb old_rgb = hsb_to_rgb_old(in_hsb);
                struct led_rgb new_rgb = hsb_to_rgb_new(in_hsb);
                
                char *eq_r = (old_rgb.r == new_rgb.r) ? ":heavy_check_mark:" : ":x:";
                char *eq_g = (old_rgb.g == new_rgb.g) ? ":heavy_check_mark:" : ":x:";
                char *eq_b = (old_rgb.b == new_rgb.b) ? ":heavy_check_mark:" : ":x:";

                num_total += 1;
                if ((old_rgb.r == new_rgb.r) && (old_rgb.g == new_rgb.g) && (old_rgb.b == new_rgb.b)) {
                    num_correct += 1;
                } else {
                  uint8_t err_r = MIN(abs((int8_t)old_rgb.r - (int8_t)new_rgb.r), 256 - abs((int8_t)old_rgb.r - (int8_t)new_rgb.r));
                  uint8_t err_g = MIN(abs((int8_t)old_rgb.g - (int8_t)new_rgb.g), 256 - abs((int8_t)old_rgb.g - (int8_t)new_rgb.g));
                  uint8_t err_b = MIN(abs((int8_t)old_rgb.b - (int8_t)new_rgb.b), 256 - abs((int8_t)old_rgb.b - (int8_t)new_rgb.b));

                  largest_err = MAX(largest_err, err_r);
                  largest_err = MAX(largest_err, err_g);
                  largest_err = MAX(largest_err, err_b);

                  fprintf(outfile, "| %.3d | %.3d | %.3d | | %.3d | %.3d | %-18s | | %.3d | %.3d | %-18s | | %.3d | %.3d | %-18s |\n",
                          in_hsb.h, in_hsb.s, in_hsb.b,
                          old_rgb.r, new_rgb.r, eq_r,
                          old_rgb.g, new_rgb.g, eq_g,
                          old_rgb.b, new_rgb.b, eq_b);
                }
            }
        }
    }

    fprintf(outfile, "\n(%d/%d) = %.4f%% correct\n", num_correct, num_total, 100*(float)num_correct/num_total);
    fprintf(outfile, "largest error = %d\n", largest_err);
    fclose(outfile);
    return 0;
}

Some values with new formatting are off by 1

joelspadin · 2021-09-12T17:15:38Z

Without benchmarking, it's impossible to say if this code is faster. It looks like you're doing more integer operations than the old code did float operations, and it's also possible the compiler is using very large integer types to avoid overflow which might not perform as well. Floating point math on ARM processors is also probably faster than you'd expect if you're used to AVR, though my experience is mostly with more powerful ARM SoCs that can run Linux, so maybe that isn't true of lower power ones.

I couldn't find any built-in utilities for general benchmarking in Zephyr, but it's pretty easy to write your own benchmarking code, for example:

// Taken from https://github.com/google/benchmark
#define DO_NOT_OPTIMIZE(value) \
    asm volatile("" : : "r,m"(value) : "memory");

// Increase this number until each benchmark takes long enough to get good data
#define ITERATIONS 1000

void benchmark_old(void) {
  const int64_t start_ticks = k_uptime_ticks();

  for (int i = 0; i < ITERATIONS; i++) {
    for (uint16_t hue = 0; hue < 360; hue++) {
      for (uint8_t sat = 0; sat < 101; sat++) {
        for (uint8_t bri = 0; bri < 101; bri++) {
          struct zmk_led_hsb hsb = {hue, sat, bri};
          struct led_rgb rgb = hsb_to_rgb_old(hsb);
          DO_NOT_OPTIMIZE(rgb);
        }
      }
    }
  }

  const int64_t elapsed = k_uptime_ticks() - start_ticks;
  LOG_INF("Old function: %d ticks", elapsed);
}

void benchmark_new(void) {
  const int64_t start_ticks = k_uptime_ticks();

  // same loop but with hsb_to_rgb_new()

  const int64_t elapsed = k_uptime_ticks() - start_ticks;
  LOG_INF("New function: %d ticks", elapsed);
}

// Ideally you run this in a standalone app that isn't doing any interrupt handling,
// but if that's too hard to set up, just run it several times and make sure you're
// getting consistent numbers.
void run_benchmark(void) {
  benchmark_old();
  benchmark_new();
}

If you do run a benchmark, make sure you have CONFIG_FPU enabled, as it looks like it is currently not being enabled on some boards like nice!nano.

Maybe even run it with and without the FPU enabled to see how much of a difference that makes. In the (probably unlikely) scenario that we support some boards that don't have an FPU, the float version is faster when the FPU is enabled but the integer version is faster when disabled, and we expect this function to get called frequently in some future RGB animation code, then we might actually want both versions of the function so we can pick the faster one based on CONFIG_FPU.

coltontcrowe · 2021-09-18T04:05:13Z

I like that idea for sure. I'm not sure I can manage a standalone app unless there's a template I could work from. Maybe I could set up a branch where the benchmark is run every time I reset the board? I'm a little concerned that running the benchmark at startup could be different than running it after the board's been going a while. Maybe I'm overthinking it though.

joelspadin · 2021-09-18T16:27:05Z

You could maybe use a delayed work queue item to run the benchmark after a delay so it isn't affected by other init code.

coltontcrowe and others added 3 commits August 21, 2021 04:05

perf: use integers to convert hsb to rgb

541350e

Some values with new formatting are off by 1

fix: clean up most rounding errors

63c60b6

Merge branch 'zmkfirmware:main' into main

60c7a08

coltontcrowe added 2 commits October 4, 2021 02:56

feat: allow configuring fpu for rgb conversion

7dc8c8f

Merge branch 'main' of github.com:coltontcrowe/zmk

c46e573

joelspadin mentioned this pull request Oct 22, 2021

feat(underglow): simplify calculations #979

Open

ice9js mentioned this pull request Jan 30, 2022

Feature: RGB Animation Driver API #1046

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert hsb_to_rgb to use Integers Instead of Floats to Improve Performance #926

Convert hsb_to_rgb to use Integers Instead of Floats to Improve Performance #926

coltontcrowe commented Aug 29, 2021 •

edited

Loading

joelspadin commented Sep 12, 2021

coltontcrowe commented Sep 18, 2021

joelspadin commented Sep 18, 2021

Convert hsb_to_rgb to use Integers Instead of Floats to Improve Performance #926

Are you sure you want to change the base?

Convert hsb_to_rgb to use Integers Instead of Floats to Improve Performance #926

Conversation

coltontcrowe commented Aug 29, 2021 • edited Loading

joelspadin commented Sep 12, 2021

coltontcrowe commented Sep 18, 2021

joelspadin commented Sep 18, 2021

coltontcrowe commented Aug 29, 2021 •

edited

Loading