Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function ImHashAdd() to create a new ID instead of (ID + int). #6140

Closed
wants to merge 1 commit into from

Conversation

rodrigorc
Copy link
Contributor

In issue #4612 there is a comment about a hash collision in the ShowDemoWindow, in the "Synced Tables" section. As it happens there is a collision, but it is not a because of a random chance, but because of how CRC32 works and a faulty assumption in the code, that this PR tries to fix.

The issue is in imgui_tables.cpp:336 that creates a new ID by taking the old ID and adding a number.

const ImGuiID instance_id = id + instance_no;

That looks innocent enough, the hash is different. But later, it does:

PushID(instance_no * 3 + column_n);

That basically translates to PushID(0), PushID(1), PushID(2), PushID(3)... that in turn will call ImHashData(&n, sizeof(int), instance_id);.

Now, let's see what happens when id ends with an hexadecimal 0x1. On one hand we call ImHasData(&0, sizeof(int), 0xXYZ1). That if we look into the CRC32 code will do this line 4 times, being *data == 0 in all of them:

crc = (crc >> 8) ^ crc32_lut[(crc & 0xFF) ^ *data++]

Sometime later instance_no=1 and column_n=0 it will call ImHashData(&3, sizeof(int), 0xXYZ2) that will call the same crc code once with *data==3 and three times with *data==0.

But if we compare the first computation of CRC in both cases:

/*n=0*/ crc = (0xXYZ1 >> 8) ^ crc32_lut[(0xXYZ1 & 0xFF) ^ 0];
/*n=3*/ crc = (0xXYZ2 >> 8) ^ crc32_lut[(0xXYZ2 & 0xFF) ^ 3];

Simplifying a bit:

/*n=0*/ crc = 0xXY ^ crc32_lut[0xZ1];
/*n=3*/ crc = 0xXY ^ crc32_lut[0xZ2 ^ 3];

But 0xZ2 ^ 3 == 0xZ1!! And the next 3 input bytes are identical, and both functions will return the same hash!

This is not actually a weakness in CRC32 as it is not an intended use. Changing the seed using an arbitrary arithmetic operation may raise unwanted patterns in the output of the hash. The proper way to derive a new seed is to always run the CRC kernel.

So I wrote a small ImHashAdd() that does the moral equivalent to an addition, actually calling ImHashData.

I reviewed the whole Dear ImGui codebase and I only found this instance of adding a number to an ID. But maybe I skipped some, or maybe third-party code does it willy-nilly. This is why in addition to this PR I opened thos other one #6138 recommending changing the hash algorithm to a more sophisticated one.

I'm not sure that is worth it, but other option to avoid unwanted arithmetic to IDs would be to typedef it not to be an integer, a new-type pattern:

typedef unsigned int        ImGuiID_;
struct ImGuiID { ImGuiID_ id; };

Then ImHashAdd() could be converted into ImGuiID::operator+(int) and everybody would be happy.

@ocornut ocornut added tables/columns label/id and id stack implicit identifiers, pushid(), id stack labels Feb 2, 2023
@ocornut
Copy link
Owner

ocornut commented Feb 3, 2023

Thank you for the careful amount of details.

I reviewed the whole Dear ImGui codebase and I only found this instance of adding a number to an ID

It's actually used in two other occasions: for the ID of columns in the Table code (TableGetColumnResizeID() function) and in the legacy Columns API (in EndColumns(): const ImGuiID column_id = columns->ID + ImGuiID(n);.

The reason for that addition was to ensure that a majority of tables (which don't have multiple instances) could be accessed with an obvious "universal identifier" in the Dear ImGui Test Engine (e.g. "//Window/Table/Item" referring to the first instance), so I wanted the first instance ID to be unaltered. So your change as-is breaks this assumption.

Whereas my logic breaks is that in turns we are unable to access subsequent instances of a same Table with an easy/known string identifier, but only through a small helper (e.g. TableGetHeaderID() in https://github.com/ocornut/imgui_test_engine/blob/main/imgui_test_engine/imgui_te_utils.cpp#L1158 which amusingly deals with two modifications here).

  • What we can do is leave the table instance id unaltered for the first instance, and altered for subsequent instances.
  • We need to make the alteration unlikely to collide with users own stack manipulations, which was one property of the raw addition, so e.g. "//Window/Table/$$2/Item" (where $$2 stands for PushID(2)) for third instance is not great as it would likely collide with user calling PushID(row) for each own. Instead should go with a standardized syntax e.g. //Window/Table/##Instance2/Item.
  • I'll increase IMGUI_VERSION_NUM and we need to patch test engine accordingly to support old and new logic.
  • And then I don't think we really need a ImHashAdd() we can simply use ImHashData() and PushID() normally.

I'll be working on that.

ocornut added a commit that referenced this pull request Feb 3, 2023
… table. Storing instance id for convenience. (#6140)

TableGetColumnResizeID() are still using an incorrect table, but having only one-level left tends to cancel things out.
ocornut added a commit to ocornut/imgui_test_engine that referenced this pull request Feb 3, 2023
@ocornut
Copy link
Owner

ocornut commented Feb 3, 2023

Pushed fix f799a29 + amended tests and updated test engine for this ocornut/imgui_test_engine@27c3f05

Also simplified the PushID() in TableHeadersRow().
Also simplified the code in TableGetColumnResizeID() which is still doing an addition but since they don't stack each other anymore (as the first one was removed) it fixes the immediate issue. (We could store per-column/per-instance column resize border id or compute based on some hash, but right I'm looking at solving some issue with name-resolving in the Stack Tool which will affect it, so not pushing it further)

Thanks!

@ocornut ocornut closed this Feb 3, 2023
@rodrigorc
Copy link
Contributor Author

That is a great solution, but this lines make me a bit wary:

char instance_desc[12];
int instance_desc_len = ImFormatString(instance_desc, IM_ARRAYSIZE(instance_desc), "##Instance%d", instance_no);
instance_id = GetIDWithSeed(instance_desc, instance_desc + instance_desc_len, id);

##Instance plus the NUL byte are already 11 chars, so there is only space for 1 digit: if instance_no==10 that will hash "##Instance1" and... collision!

The obvious solution is to increase that array size to 16, of course.

But just for the sake of reviving my old code, it could also be a new override:

ImGuiID       GetIDWithSeed(int n, ImGuiID seed);

And then we need no string format:

instance_id = GetIDWithSeed(instance_no, GetIDWithSeed("##Instance", NULL, id));

@ocornut
Copy link
Owner

ocornut commented Feb 3, 2023

You are absolutely right. Pushed d6ea56d.
Thanks and sorry for the overlook (renamed that field without increasing buffer size)
As a test engine path it becomes:
"table/##Instances/$$1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
label/id and id stack implicit identifiers, pushid(), id stack tables/columns
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants