Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase variation in Unicode character generation #1621

Closed
wants to merge 2 commits into from

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Oct 5, 2018

This patch substantially increases the variety of examples from the characters() strategy. Instead of generating directly from codepoint, characters() now selects a Unicode category and then a code point within that category. However the shrink target is unchanged, as the choice of category 'shrinks open' to allow codepoint-wise minimisation during shrinking.

I've extracted the unrelated cleanups and refactoring as #1660 - it would be great if that can be reviewed and merged soon, so I can rebase on it.

Closes #1401 and closes #341.

@Zac-HD Zac-HD added the enhancement it's not broken, but we want it to be better label Oct 5, 2018
@Zac-HD Zac-HD force-pushed the character-generation branch 7 times, most recently from b678f58 to f4f4635 Compare October 6, 2018 05:12
@Zac-HD Zac-HD force-pushed the character-generation branch from f4f4635 to fb7b08b Compare October 6, 2018 10:49
@Zac-HD Zac-HD force-pushed the character-generation branch 2 times, most recently from dad21c7 to 33c6223 Compare October 11, 2018 12:43
@Zac-HD Zac-HD changed the title Increase variation in generated characters and change shrink order Increase variation in Unicode character generation Oct 11, 2018
@Zac-HD
Copy link
Member Author

Zac-HD commented Oct 11, 2018

OK! I've tidied this up and ensured that the increased variety in generation does not change shrinking. Ready for another review 😄

@Zac-HD Zac-HD force-pushed the character-generation branch from 33c6223 to 0ed071f Compare October 23, 2018 09:23
@Zac-HD Zac-HD force-pushed the character-generation branch 2 times, most recently from b03bfaa to 79f85fa Compare October 25, 2018 01:50
@Zac-HD Zac-HD force-pushed the character-generation branch 4 times, most recently from 38284e6 to aa894b5 Compare October 25, 2018 12:21
@Zac-HD
Copy link
Member Author

Zac-HD commented Oct 27, 2018

Hmm. After digging into this, it looks like it's not fully broken - instead, there are some tests that don't reliable shrink to a fixpoint within the 500 shrink limit. @DRMacIver, any idea what I should do about that?

When passed an alphabet of characters (not a strategy), we can do much better for shrinking than simply sampling from it - delegating to characters() if possible or sorting before sampling otherwise.
@Zac-HD Zac-HD force-pushed the character-generation branch from 396e56c to f1f951d Compare December 8, 2018 11:24
@Zac-HD
Copy link
Member Author

Zac-HD commented Dec 8, 2018

I'm going to close this issue for now, as while the approach is promising there is also something of a clash between the generation and shrinking steps and I simply don't want to work on the frustration last ~5% of the problem at the moment.

@Zac-HD
Copy link
Member Author

Zac-HD commented Jul 9, 2019

Rebased and (somewhat) updated version: master...Zac-HD:weird-text - shrinking isn't quite right at the moment but everything else seems to be working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weighting of emoji in text strategies? Chance of generating unnormalized unicode string is vanishingly low
3 participants