Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix 22294 & 22262 #2151

Merged
merged 1 commit into from
May 19, 2020
Merged

fix 22294 & 22262 #2151

merged 1 commit into from
May 19, 2020

Conversation

Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented May 19, 2020

Issue 22294, firing assert(offset_1 <= current +1) within compress_fast_extDict(),
and issue 22262, firing assert(offset_1 <= dictAndPrefixLength) within compress_fast_dictMatchState(),
are actually a consequence of a new behavior enabled within ldm, that has become possible since the new --patch-from capability (and therefore not present in earlier versions) :

  • Previously, ldm would not load any dictionary when in multithreading mode
    • This is still the case when using the "one-ingestion" strategy
  • This has been updated as part of the --patch-from capability
    • Loading the dictionary into the ldm is necessary in order to catch long-distance correlations
    • Note that only the "streaming" mode has been updated, as it's the only mode relevant for --patch-from
  • However, there is a subtle twist : this patch makes the ldm ingest the full dictionary as a content, irrespective of being a "raw" dictionary (only content) or a "full" dictionary (header and entropy tables)
    • As a consequence, when the dictionary is "full", the ldm loads entropy tables as "content", and may be able to find matches into them (just as a matter of random luck)
    • Once the ldm has found such a match, it passes the following literals section to the regular match finder (compress_fast() in this case), where the previous offset is passed as repeat code
    • the previous offset now leads beyond the dictionary content, which was properly loaded into the regular match finder. As a consequence, the offset underflows the index.
    • this is caught by the assert(). Without the assert(), the resulting pointer is invalid, and result in a segfault.

This PR makes the minimum to control the damage :
Now ldm only loads the dictionary if it is labelled a "raw" dictionary, as it is the only case which matters for --patch-from. If the dictionary is labelled "auto" or "full", it is not loaded at all.
This was the behavior of ldm before the --patch-from mode.
It also avoids expanding the scope and creating new scenarios, that would have to be fuzzed.

This PR doesn't have a test case attached.
It's a bit tricky to generate, as it requires a "full" dictionary, with a matching pattern directly in the encoded entropy section of the header (which looks like random bytes), so it's unclear how to generate this case intentionally.

But we can add the known cases as "golden files" to our regression corpus.

@Cyan4973 Cyan4973 changed the title fix 22294 fix 22294 & 22262 May 19, 2020
@terrelln
Copy link
Contributor

Just to verify, this fixes both offset_1 bugs, right?

@Cyan4973
Copy link
Contributor Author

Right

@Cyan4973 Cyan4973 merged commit fdc56ba into dev May 19, 2020
@Cyan4973 Cyan4973 deleted the fix22294 branch November 19, 2020 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants