Better heuristics for generating weird unicode strings #3127
Labels
enhancement
it's not broken, but we want it to be better
internals
Stuff that only Hypothesis devs should ever see
Generating strings which find all the possible bugs in a program is hard - even at a codepoint-by-codepoint level like in #1401. Worse, there are many bugs that are triggered by sequences of codepoints (e.g. combining characters, emoji composition, etc.) or even more strucured strings like XSS attacks.
Eventually, I would like to 'make our own luck', by teaching
text()
to pick from a list of known-weird strings (or templates for weird things) and then shrink it as if we and randomly generated that sequence of codepoints. This is already on the wishlist in #3086, at which point it's mostly a matter of vendoring e.g. https://github.com/minimaxir/big-list-of-naughty-strings and whatever else we can think of based on e.g. Text Rendering Hates You, Text Editing Hates You Too, and so on (ligatures, RTL/LTR/TTB text directions, mixed-direction text, emoji modifiers, EICAR test string, ...).The text was updated successfully, but these errors were encountered: