Use typed choice sequence in database #4241

tybug · 2025-01-12T19:54:21Z

The database now stores entries as (sort_key_ir(data.ir_nodes), data.choices), letting us sort entries by shrink ordering. If ir_from_bytes deserialization fails, return None. Caller is responsible for handling this case (usually by clearing the bad entry from the db).

I think it's worth storing the choice complexity in the db, for better sorting. The case of interest is when serialization is large but the complexity is low. This happens mainly via min_size= or shrink_towards. I'm not certain how big of a problem this is - probably most issues require storing some (small + high complexity) or (large + low complexity) value, then changing the test case to add or remove a min_size, changing the relative complexity baseline. I think the worst this would manifest as is us repeatedly trying clearly-suboptimal examples from the database, or rejecting possible-shrinks from the secondary corpus as "too complex" when they aren't. We'd still like to avoid these though.

Also bundles c0f8979, which is an independent cleanup commit.

tybug · 2025-01-13T20:46:57Z

I've addressed a readthedocs brownout-soon-to-be-blackout (ci, blog) here as well since it was so simple. If this pull doesn't get merged next then we can cherrypick that over to whatever does.

Zac-HD · 2025-01-14T08:14:41Z

I think it's worth storing the choice complexity in the db, for better sorting. The case of interest is when serialization is large but the complexity is low. This happens mainly via min_size= or shrink_towards. I'm not certain how big of a problem this is - probably most issues require storing some (small + high complexity) or (large + low complexity) value, then changing the test case to add or remove a min_size, changing the relative complexity baseline. I think the worst this would manifest as is us repeatedly trying clearly-suboptimal examples from the database, or rejecting possible-shrinks from the secondary corpus as "too complex" when they aren't. We'd still like to avoid these though.

I'm not convinced that storing choice complexity this is worth the complication to our code and almost doubling the size of database entries - shortlex over our serialization format is already usually pretty close to shrink ordering in reasonable cases. If we don't have some reason to think it helps a lot, I'd strongly prefer to drop it.

PR otherwise looks ready to merge 👍

tybug · 2025-01-14T17:23:34Z

ok, I'm good with that! I think there were a few db test failures when I tried, but they were in the ~slightly bad category, not the ~moderately bad category. We can come back to this pull if we see bad regressions as a result.

hypothesis-python/src/hypothesis/internal/conjecture/choice.py

hypothesis-python/tests/conjecture/test_ir.py

Zac-HD · 2025-01-16T01:36:17Z

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

+def shortlex(s):
+    return (len(s), s)


seems ridiculous that we didn't have this defined anywhere before, but apparently not!

Not by this name, but this is really just sort_key renamed!

Zac-HD · 2025-01-16T01:40:08Z

hypothesis-python/RELEASE.rst

Given that we already document that the database might be lost when you update Hypothesis, it feels to me like this is spending too much time explaining what we mean; I'd rather give a relatively brief description like the first paragraph and then focus instead on what users should do (ie: do not rely on the database across versions, consider using @example(), ensure that any environments with a shared DB (eg CI and local) use the same version where possible.)

tybug requested review from DRMacIver and Zac-HD as code owners January 12, 2025 19:54

tybug mentioned this pull request Jan 12, 2025

Add and use BytestringProvider in fuzz_one_input #4221

Merged

tybug force-pushed the db-choices branch from e05a1bf to ada8e0e Compare January 13, 2025 00:36

tybug mentioned this pull request Jan 13, 2025

Migrate our core representation to the typed choice sequence #3921

Open

1 task

move more functions to choice.py

3356b67

tybug force-pushed the db-choices branch from 8d62f38 to 2ba33f2 Compare January 13, 2025 20:44

tybug mentioned this pull request Jan 14, 2025

Implement second-order jack-knife lower bound on number of branches Zac-HD/hypofuzz#44

Open

tybug added 2 commits January 15, 2025 01:14

handle bool/float cases correctly in choice_key

5af0549

address rtd brownout

6383100

tybug force-pushed the db-choices branch from 2ba33f2 to 6454044 Compare January 15, 2025 06:17

tybug commented Jan 15, 2025

View reviewed changes

hypothesis-python/src/hypothesis/internal/conjecture/choice.py Outdated Show resolved Hide resolved

use typed choice sequence in the database

2f0981b

tybug force-pushed the db-choices branch from e7a2a3e to 2f0981b Compare January 15, 2025 06:27

Zac-HD approved these changes Jan 16, 2025

View reviewed changes

tybug force-pushed the db-choices branch from 4eeb38e to 3b2da74 Compare January 16, 2025 05:22

add more tests, reword release notes

714bc5d

tybug force-pushed the db-choices branch from 3b2da74 to 714bc5d Compare January 16, 2025 05:22

tybug enabled auto-merge January 16, 2025 05:23

tybug merged commit cca3c71 into HypothesisWorks:master Jan 16, 2025
47 checks passed

tybug deleted the db-choices branch January 16, 2025 05:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use typed choice sequence in database #4241

Use typed choice sequence in database #4241

tybug commented Jan 12, 2025 •

edited

Loading

tybug commented Jan 13, 2025

Zac-HD commented Jan 14, 2025

tybug commented Jan 14, 2025

Zac-HD Jan 16, 2025

tybug Jan 16, 2025

Zac-HD Jan 16, 2025

Use typed choice sequence in database #4241

Use typed choice sequence in database #4241

Conversation

tybug commented Jan 12, 2025 • edited Loading

tybug commented Jan 13, 2025

Zac-HD commented Jan 14, 2025

tybug commented Jan 14, 2025

Zac-HD Jan 16, 2025

Choose a reason for hiding this comment

tybug Jan 16, 2025

Choose a reason for hiding this comment

Zac-HD Jan 16, 2025

Choose a reason for hiding this comment

tybug commented Jan 12, 2025 •

edited

Loading