New function ImHashAdd() to create a new ID instead of (ID + int). #6140

rodrigorc · 2023-02-02T23:15:41Z

In issue #4612 there is a comment about a hash collision in the ShowDemoWindow, in the "Synced Tables" section. As it happens there is a collision, but it is not a because of a random chance, but because of how CRC32 works and a faulty assumption in the code, that this PR tries to fix.

The issue is in imgui_tables.cpp:336 that creates a new ID by taking the old ID and adding a number.

const ImGuiID instance_id = id + instance_no;

That looks innocent enough, the hash is different. But later, it does:

PushID(instance_no * 3 + column_n);

That basically translates to PushID(0), PushID(1), PushID(2), PushID(3)... that in turn will call ImHashData(&n, sizeof(int), instance_id);.

Now, let's see what happens when id ends with an hexadecimal 0x1. On one hand we call ImHasData(&0, sizeof(int), 0xXYZ1). That if we look into the CRC32 code will do this line 4 times, being *data == 0 in all of them:

crc = (crc >> 8) ^ crc32_lut[(crc & 0xFF) ^ *data++]

Sometime later instance_no=1 and column_n=0 it will call ImHashData(&3, sizeof(int), 0xXYZ2) that will call the same crc code once with *data==3 and three times with *data==0.

But if we compare the first computation of CRC in both cases:

/*n=0*/ crc = (0xXYZ1 >> 8) ^ crc32_lut[(0xXYZ1 & 0xFF) ^ 0];
/*n=3*/ crc = (0xXYZ2 >> 8) ^ crc32_lut[(0xXYZ2 & 0xFF) ^ 3];

Simplifying a bit:

/*n=0*/ crc = 0xXY ^ crc32_lut[0xZ1];
/*n=3*/ crc = 0xXY ^ crc32_lut[0xZ2 ^ 3];

But 0xZ2 ^ 3 == 0xZ1!! And the next 3 input bytes are identical, and both functions will return the same hash!

This is not actually a weakness in CRC32 as it is not an intended use. Changing the seed using an arbitrary arithmetic operation may raise unwanted patterns in the output of the hash. The proper way to derive a new seed is to always run the CRC kernel.

So I wrote a small ImHashAdd() that does the moral equivalent to an addition, actually calling ImHashData.

I reviewed the whole Dear ImGui codebase and I only found this instance of adding a number to an ID. But maybe I skipped some, or maybe third-party code does it willy-nilly. This is why in addition to this PR I opened thos other one #6138 recommending changing the hash algorithm to a more sophisticated one.

I'm not sure that is worth it, but other option to avoid unwanted arithmetic to IDs would be to typedef it not to be an integer, a new-type pattern:

typedef unsigned int        ImGuiID_;
struct ImGuiID { ImGuiID_ id; };

Then ImHashAdd() could be converted into ImGuiID::operator+(int) and everybody would be happy.

ocornut · 2023-02-03T17:02:49Z

Thank you for the careful amount of details.

I reviewed the whole Dear ImGui codebase and I only found this instance of adding a number to an ID

It's actually used in two other occasions: for the ID of columns in the Table code (TableGetColumnResizeID() function) and in the legacy Columns API (in EndColumns(): const ImGuiID column_id = columns->ID + ImGuiID(n);.

The reason for that addition was to ensure that a majority of tables (which don't have multiple instances) could be accessed with an obvious "universal identifier" in the Dear ImGui Test Engine (e.g. "//Window/Table/Item" referring to the first instance), so I wanted the first instance ID to be unaltered. So your change as-is breaks this assumption.

Whereas my logic breaks is that in turns we are unable to access subsequent instances of a same Table with an easy/known string identifier, but only through a small helper (e.g. TableGetHeaderID() in https://github.com/ocornut/imgui_test_engine/blob/main/imgui_test_engine/imgui_te_utils.cpp#L1158 which amusingly deals with two modifications here).

What we can do is leave the table instance id unaltered for the first instance, and altered for subsequent instances.
We need to make the alteration unlikely to collide with users own stack manipulations, which was one property of the raw addition, so e.g. "//Window/Table/$$2/Item" (where $$2 stands for PushID(2)) for third instance is not great as it would likely collide with user calling PushID(row) for each own. Instead should go with a standardized syntax e.g. //Window/Table/##Instance2/Item.
I'll increase IMGUI_VERSION_NUM and we need to patch test engine accordingly to support old and new logic.
And then I don't think we really need a ImHashAdd() we can simply use ImHashData() and PushID() normally.

I'll be working on that.

… table. Storing instance id for convenience. (#6140) TableGetColumnResizeID() are still using an incorrect table, but having only one-level left tends to cancel things out.

…d "table_multi_instances" test with extra bits. Ref ocornut/imgui#6140

ocornut · 2023-02-03T19:07:05Z

Pushed fix f799a29 + amended tests and updated test engine for this ocornut/imgui_test_engine@27c3f05

Also simplified the PushID() in TableHeadersRow().
Also simplified the code in TableGetColumnResizeID() which is still doing an addition but since they don't stack each other anymore (as the first one was removed) it fixes the immediate issue. (We could store per-column/per-instance column resize border id or compute based on some hash, but right I'm looking at solving some issue with name-resolving in the Stack Tool which will affect it, so not pushing it further)

Thanks!

rodrigorc · 2023-02-03T20:48:39Z

That is a great solution, but this lines make me a bit wary:

char instance_desc[12];
int instance_desc_len = ImFormatString(instance_desc, IM_ARRAYSIZE(instance_desc), "##Instance%d", instance_no);
instance_id = GetIDWithSeed(instance_desc, instance_desc + instance_desc_len, id);

##Instance plus the NUL byte are already 11 chars, so there is only space for 1 digit: if instance_no==10 that will hash "##Instance1" and... collision!

The obvious solution is to increase that array size to 16, of course.

But just for the sake of reviving my old code, it could also be a new override:

ImGuiID       GetIDWithSeed(int n, ImGuiID seed);

And then we need no string format:

instance_id = GetIDWithSeed(instance_no, GetIDWithSeed("##Instance", NULL, id));

…#6140)

ocornut · 2023-02-03T21:52:12Z

You are absolutely right. Pushed d6ea56d.
Thanks and sorry for the overlook (renamed that field without increasing buffer size)
As a test engine path it becomes:
"table/##Instances/$$1"

New function ImHashAdd() to create a new ID instead of (ID + int).

416f90c

ocornut added tables/columns label/id and id stack implicit identifiers, pushid(), id stack labels Feb 2, 2023

ocornut added a commit to ocornut/imgui_test_engine that referenced this pull request Feb 3, 2023

TestEngine: Update/fixes for Tables related changes + TestSuite: amen…

27c3f05

…d "table_multi_instances" test with extra bits. Ref ocornut/imgui#6140

ocornut closed this Feb 3, 2023

ocornut added a commit that referenced this pull request Feb 3, 2023

Tables: amend f799a29 with a better solution + fix potential overflow (…

d6ea56d

…#6140)

ocornut added a commit that referenced this pull request Feb 3, 2023

Tables: amend f799a29 with a better solution + fix potential overflow (…

c807192

…#6140)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New function ImHashAdd() to create a new ID instead of (ID + int). #6140

New function ImHashAdd() to create a new ID instead of (ID + int). #6140

rodrigorc commented Feb 2, 2023

ocornut commented Feb 3, 2023 •

edited

Loading

ocornut commented Feb 3, 2023

rodrigorc commented Feb 3, 2023

ocornut commented Feb 3, 2023

New function ImHashAdd() to create a new ID instead of (ID + int). #6140

New function ImHashAdd() to create a new ID instead of (ID + int). #6140

Conversation

rodrigorc commented Feb 2, 2023

ocornut commented Feb 3, 2023 • edited Loading

ocornut commented Feb 3, 2023

rodrigorc commented Feb 3, 2023

ocornut commented Feb 3, 2023

ocornut commented Feb 3, 2023 •

edited

Loading