-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use typed choice sequence in database #4241
Conversation
I'm not convinced that storing choice complexity this is worth the complication to our code and almost doubling the size of database entries - shortlex over our serialization format is already usually pretty close to shrink ordering in reasonable cases. If we don't have some reason to think it helps a lot, I'd strongly prefer to drop it. PR otherwise looks ready to merge 👍 |
ok, I'm good with that! I think there were a few db test failures when I tried, but they were in the ~slightly bad category, not the ~moderately bad category. We can come back to this pull if we see bad regressions as a result. |
def shortlex(s): | ||
return (len(s), s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems ridiculous that we didn't have this defined anywhere before, but apparently not!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not by this name, but this is really just sort_key
renamed!
hypothesis-python/RELEASE.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we already document that the database might be lost when you update Hypothesis, it feels to me like this is spending too much time explaining what we mean; I'd rather give a relatively brief description like the first paragraph and then focus instead on what users should do (ie: do not rely on the database across versions, consider using @example()
, ensure that any environments with a shared DB (eg CI and local) use the same version where possible.)
The database now stores entries as
(sort_key_ir(data.ir_nodes), data.choices)
, letting us sort entries by shrink ordering. Ifir_from_bytes
deserialization fails, return None. Caller is responsible for handling this case (usually by clearing the bad entry from the db).I think it's worth storing the choice complexity in the db, for better sorting. The case of interest is when serialization is large but the complexity is low. This happens mainly via
min_size=
orshrink_towards
. I'm not certain how big of a problem this is - probably most issues require storing some (small + high complexity) or (large + low complexity) value, then changing the test case to add or remove a min_size, changing the relative complexity baseline. I think the worst this would manifest as is us repeatedly trying clearly-suboptimal examples from the database, or rejecting possible-shrinks from the secondary corpus as "too complex" when they aren't. We'd still like to avoid these though.Also bundles c0f8979, which is an independent cleanup commit.