From 280d3e7c51e18e2735588452a67abd493638e536 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Wed, 31 Jan 2018 10:56:40 +0000 Subject: [PATCH 01/17] Add a guide for working on internals --- guides/internals.rst | 321 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 321 insertions(+) create mode 100644 guides/internals.rst diff --git a/guides/internals.rst b/guides/internals.rst new file mode 100644 index 0000000000..d9dd5e8314 --- /dev/null +++ b/guides/internals.rst @@ -0,0 +1,321 @@ +=================================== +How to Work on Hypothesis Internals +=================================== + +This is a guide to how to work on Hypothesis internals, +with a particular focus on helping people who are new to it. +Right now it is very rudimentary and is intended primarily for people who are +looking to get started writing shrink passes as part of our `current outreach +program to get more people doing that `_, +but it will expand over time. + +------------------------ +Bird's Eye View Concepts +------------------------ + +The core engine of Hypothesis is called Conjecture. + +The "fundamental idea" of Conjecture is that you can represent an arbitrary +randomized test case as a string of bytes, which are basically intended as the +underlying entropy of some pseudo-random number generator (PRNG). +Whenever you want to do something "random" you just read the next bytes and +do what they tell you to do. By manipulating these bytes, we can achieve +more interesting effects than pure randomness would allow us to do, while +retaining the power and ease of use of random testing. + +The idea of shrinking in particular is that once we have this representation, +we can shrink arbitrary test cases based on it. We try to produce a string that +is *shortlex minimal*. What this means is that it has the shortest possible +length and among those strings of minimal length is lexicographically (i.e. the +normal order on strings - find the first byte at which they differ and use that +to decide) smallest. + +Ideally we could think of the shrinker is a generic function that takes a +string satisfying some predicate and returns the shortlex minimal string that +also satisfies it. +This is wrong on several levels: The first is that we only succeed in approximating +such a minimal string. The second is that we are only interested in minimizing +things where the predicate goes through the Hypothesis API, which lets us track +a lot of info about how the data is used and use that to guide the process. + +We then use a number of different transformations of the string to try and +reduce our input. These vary from principled general transformations to shameless +hacks that special case something we need to work well. We try to aim for mostly +the former, but the nice thing about this model is that the underlying representation +is fully general and we are free to try whatever we want and it will never result +in us doing the wrong thing, so hacks are only a problem to the degree that they +result in messy code and fragile heuristics, they're never a correctness issue, +so if we can't make something work without such a hack it's not a big deal. + +One such example of a hack is the handling of floating point numbers. There are +a couple of lexicographic shrinks that are always valid but only really make +sense for our particular encoding of floats. We simply detect when we're working +on something that is of the right size to be a float and apply those transformations. +Worst case scenario it's not a float and they don't work, and we've run a few +extra test cases. + +-------------------------- +Useful Files to Know About +-------------------------- + +The code associated with Conjecture lives in +`src/hypothesis/internal/conjecture `_. +There are a number of files in there, +but the most important ones are ``engine.py`` and ``data.py``. +``data.py`` defines the core type that is used to represent test cases, +and ``engine.py`` contains the main driver for deciding what test cases to run. + +There is also ``minimizer.py``, which contains a general purpose lexicographic +minimizer. This is responsible for taking some byte string and a predicate over +byte strings and producing a string of the same length which is lexicographically +smaller. Unlike the shrinker in general, this *is* supposed to work on arbitrary +predicates and doesn't know anything about the testing API. We typically apply +this to subsets of the bytes for a test input with a predicate that knows how +to integrate those subsets into a larger test. This is the part of the code +that means we can do things like replacing an integer with a smaller one. + +------- +Testing +------- + +The Hypothesis test suite is rather large, but there are a couple of areas in +particular that are useful to know about when making engine changes. + +The first is `tests/cover/test_conjecture_engine.py `_, +which is a set of unit tests designed to put the engine into particular scenarios to exercise specific behaviours, +with a goal of achieving 100% coverage on it in isolation (though it currently does not quite achieve that for some specific edge cases. +We may fix and enforce this later). + +The other set of tests that are worth knowing about are the quality tests, +in `tests/quality `_. +These assert specific hard to satisfy properties about the examples that Hypothesis finds - +either their existence, or something about the final shrunk result. + +To run a specific test file manually, you can use pytest. I usually use the +following invocation: + +.. code-block:: + + python -m pytest tests/cover/test_conjecture_engine.py + +You will need to have Hypothesis installed locally to run these. I recommend a +virtualenv where you have run ``pip install -e .``, which installs all the +dependencies and puts your ``src`` directory in the path of installed packages +so that edits you make are automatically pipped up. + +Useful arguments you can add to pytest are `` -n 0``, which will disable build +parallelism (I find that on my local laptop the startup time is too high to be +worth it when running single files, so I usually do this), and `` -kfoo` where +foo is some substring common to the set of tests you want to run (you can also +use composite expressions here. e.g. `` -k'foo and not bar'`` will run anything +containing foo that doesn't also contain bar). + +----------------------- +Engine Design Specifics +----------------------- + +There are a couple of code patterns that are mostly peculiar to Conjecture that +you may not have encountered before and are worth being aware of. + +~~~~~~~~~~~~~~~~~~~~ +Search State Objects +~~~~~~~~~~~~~~~~~~~~ + +There are a number of cases where we find ourself with a user-provided function +(where the "user" might still be something that is entirely our code) and we +want to pass a whole bunch of different examples to it in order to achieve some +result. Currently this includes each of the main engine, the Shrinker (in +engine.py) and the minimizer, but there are likely to be more in future. + +We typically organise such things in terms of an object that you create with +the function and possibly an initial argument that stores these on self and +has some ``run`` or similar method. They then run for a while, repeatedly +calling the function they were given. + +Generally speaking they do not call the function directly, but instead wrap +calls to it. This allows them to implement a certain amount of decision caching, +e.g. avoiding trying the same shrink twice, but also gives us a place where we +can update metadata about the search process. + +~~~~~~~~~~~ +Weird Loops +~~~~~~~~~~~ + +The loops inside a lot of the engine look very strange and unidiomatic. For +example: + +.. code-block:: python + + i = 0 + while i < len(self.intervals): + u, v = self.intervals[i] + if not self.incorporate_new_buffer( + self.shrink_target.buffer[:u] + self.shrink_target.buffer[v:] + ): + i += 1 + + +The more natural way to write this in Python would of course be: + +.. code-block:: python + + for u, v in self.intervals: + self.incorporate_new_buffer( + self.shrink_target.buffer[:u] + self.shrink_target.buffer[v:] + ) + +This way of writing the loop would be *entirely wrong*. + +Every time `incorporate_new_buffer` succeeds, it changes the shape of the +current shrink target. This consequently changes the shape of intervals, both +its particular values and its current length - on each loop iteration the loop +might stop either because ``i`` increases or because ``len(self.intervals)`` +decreases. + +An additional quirk is that we only increment ``i`` on failure. The reason for +this is that if we successfully deleted the current interval then the interval +in position ``i`` has been replaced with something else, which is probably the +next thing we would have tried deleting if we hadn't succeeded (or something +like it), so we don't want to advance past it. + +------------ +The Shrinker +------------ + +The shrinking part of Hypothesis is organised into a single class called ``Shrinker`` +that lives in engine.py. + +Its job is to take an initial ``ConjectureData`` object and some predicate that +it satisfies, and to try to produce a simpler ``ConjectureData`` object that +also satisfies that predicate. + +~~~~~~~~~~~~~~ +Search Process +~~~~~~~~~~~~~~ + +The search process mostly happens in the ``shrink`` method. It is split into +two parts: ``greedy_shrink`` and ``escape_local_minimum``. The former is a +greedy algorithm, meaning that it will only ever call the predicate with values +that are strictly smaller than our current best. This mostly works very well, +but sometimes it gets stuck. So what we do is after we have run that we try +restarting the process from something like our final state but a bit fuzzed and +run the greedy shrink again. We keep doing this as long as it results in a +smaller value than our previous best. + +The greedy shrinker is where almost all of the work happens. It is organised +into a large number of search passes, and is designed to run until all of those +passes fail to make any improvements. + +~~~~~~~~~~~~~ +Search Passes +~~~~~~~~~~~~~ + +Search passes are just methods on the ``Shrinker`` class in engine.py. They are +designed to take the current shrink target and try a number of things that might +be sensible shrinks of it. + +Typically the design of a search pass is that it should always try to run to +completion rather than exiting as soon as it's found something good, but that +it shouldn't retry things that are too like stuff it has already tried just +because something worked. So for example in the above loop, we try deleting +each interval (these roughly correspond to regions of the input that are +responsible for some particular value or small number of adjacent values). +When we succeed, we keep going and try deleting more intervals, but we don't +try to delete any intervals before the current index. + +The reason for this is that retrying things from the beginning might work but +probably won't. Thus if we restarted every time we made a change we would end +up doing a lot of useless work. Additionally, they are *more* likely to work +after other shrink passes have run because frequently other changes are likely +to unlock changes in the current pass that were previously impossible. e.g. +when we reorder some examples we might make a big region deletable that +previously contained something critical to the relevant behaviour of the test +but is now just noise. + +Because the shrinker runs in a big loop, if we've made progress the shrink pass +will always be run again (assuming we don't hit some limit that terminates the +shrink early, but by making the shrinker better we try to ensure that that +never happens). +This means that we will always get an opportunity to start again later if we +made progress, and if we didn't make progress we've tried everything anyway. + + +~~~~~~~~~~~~~~~~~~~~~~~ +Expensive Shrink Passes +~~~~~~~~~~~~~~~~~~~~~~~ + +We have a bunch of search passes that are considered "expensive". Typically +this means "quadratic or worse complexity". When shrinking we initially don't +run these, and the first time that we get to the end of our main passes and +have failed to make the input any smaller, we then turn them on. + +This allows the shrinker to switch from a good but slightly timid mode while its +input is large into a more aggressive DELETE ALL THE THINGS mode once that stops +working. By that point ideally we've made our input small enough that quadratic +complexity is acceptable. + +We turn these on once and then they stay on. The reason for this is to avoid a +"flip-flopping" scenario where an expensive pass unlocks one trivial change that +the cheap passes can find and then they get stuck again and have to do an extra +useless run through the passes to prove that. + +~~~~~~~~~~~~~~~~~~~~~~ +Adaptive Shrink Passes +~~~~~~~~~~~~~~~~~~~~~~ + +A useful trick that some of the shrink passes use is to try a thing and if it +doesn't work take a look at what the test function did to guess *why* it didn't +work and try to repair that. + +Two example such passes are ``zero_draws`` and the various passes that try to +minimize individual blocks lexicographically. + +What happens in ``zero_draws`` is that we try replacing the region corresponding +to a draw with all zero bytes. If that doesn't work, we check if that was because +of changing the size of the example (e.g. doing that with a list will make the +list much shorter) and messing up the byte stream after that point. If this +was what happened then we try again with a sequence of zeroes that corresponds +to the size of the draw call in the version we tried that didn't work. + +The logic for what we do with block minimization is in ``try_shrinking_blocks``. +When it tries shrinking a block and it doesn't work, it checks if the sized +changed. If it does then it tries deleting the number of bytes that were lost +immediately after the shrunk block to see if it helps. + + +-------------- +Playing Around +-------------- + +I often find that it is informative to watch the shrink process in action using +Hypothesis's verbosity settings. This can give you an idea of what the format +of your data is, and how the shrink process transforms it. + +In particular, it is often useful to run a test with the flag `` -s`` to tell it +not to hide output and the environment variable ``HYPOTHESIS_VERBOSITY_LEVEL=debug``. +This will give you a very detailed log of what the testing process is running, +along with information about what passes in the shrinker rare running and how +they transform it. + +--------------- +Getting Started +--------------- + +The best way of getting started on working on the engine is to work on the +shrinker. This is because it has the most well defined problems, the best +documented code among the engine, and it's generally fun to work on. + +If you have not already done so, check out `Issue #1093 `_, +which collates a number of other issues about shrink quality that are good starting +points for people. + +The best place to get started thus is to take a look at those linked issues and +just jump in and try things! Find one that you think sounds fun. Note that some +of them suggest not doing these as your first foray into the shrinker, as some +are harder than others. + +*Please* ask questions if you have any - either the main issue for general +purpose questions or specific issues for questions about a particular problem - +if you get stuck or if anything doesn't make sense. We're trying to make this +process easier for everyone to work on, so asking us questions is actively +helpful to us and we will be very grateful to you for doing so. From e77b5da09e3275c05f8b96b2f12d3b1dc43399cc Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Wed, 31 Jan 2018 12:10:47 +0000 Subject: [PATCH 02/17] Fix unbalanced quotes --- guides/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/guides/internals.rst b/guides/internals.rst index d9dd5e8314..83571eef44 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -105,7 +105,7 @@ so that edits you make are automatically pipped up. Useful arguments you can add to pytest are `` -n 0``, which will disable build parallelism (I find that on my local laptop the startup time is too high to be -worth it when running single files, so I usually do this), and `` -kfoo` where +worth it when running single files, so I usually do this), and `` -kfoo`` where foo is some substring common to the set of tests you want to run (you can also use composite expressions here. e.g. `` -k'foo and not bar'`` will run anything containing foo that doesn't also contain bar). From 9ae7f27ccaa06a09794a33b535315856e4273b8f Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Wed, 31 Jan 2018 12:30:17 +0000 Subject: [PATCH 03/17] Those spaces aren't needed --- guides/internals.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/guides/internals.rst b/guides/internals.rst index 83571eef44..39de42b0a5 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -103,11 +103,11 @@ virtualenv where you have run ``pip install -e .``, which installs all the dependencies and puts your ``src`` directory in the path of installed packages so that edits you make are automatically pipped up. -Useful arguments you can add to pytest are `` -n 0``, which will disable build +Useful arguments you can add to pytest are ``-n 0``, which will disable build parallelism (I find that on my local laptop the startup time is too high to be -worth it when running single files, so I usually do this), and `` -kfoo`` where +worth it when running single files, so I usually do this), and ``-kfoo`` where foo is some substring common to the set of tests you want to run (you can also -use composite expressions here. e.g. `` -k'foo and not bar'`` will run anything +use composite expressions here. e.g. ``-k'foo and not bar'`` will run anything containing foo that doesn't also contain bar). ----------------------- @@ -291,7 +291,7 @@ I often find that it is informative to watch the shrink process in action using Hypothesis's verbosity settings. This can give you an idea of what the format of your data is, and how the shrink process transforms it. -In particular, it is often useful to run a test with the flag `` -s`` to tell it +In particular, it is often useful to run a test with the flag ``-s`` to tell it not to hide output and the environment variable ``HYPOTHESIS_VERBOSITY_LEVEL=debug``. This will give you a very detailed log of what the testing process is running, along with information about what passes in the shrinker rare running and how From 7e1fc716864edb18dedc7adbea3ab7b2e289ef9e Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Wed, 31 Jan 2018 12:37:02 +0000 Subject: [PATCH 04/17] Expand on what we track --- guides/internals.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/guides/internals.rst b/guides/internals.rst index 39de42b0a5..b9af81a83d 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -137,6 +137,22 @@ calls to it. This allows them to implement a certain amount of decision caching, e.g. avoiding trying the same shrink twice, but also gives us a place where we can update metadata about the search process. +For objects whose goal is some form of optimisation (Shrinker, Minimizer) one +of the pieces of metadata they will typically track is a "current target". This +is typically the best example they have seen so far. By wrapping every call to +the predicate, we ensure that we never miss an example even when we're passing +through other things. + +For objects whose goal is some broader form of search (currently only +``ConjectureRunner``) this also allows them to keep track of *other* examples +of interest. For example, as part of our multiple bug discovery, +``ConjectureRunner`` keeps track of the smallest example of each distinct +failure that it has seen, and updates this automatically each time the test +function is called. This means that if during shrinking we "slip" and find a +different bug than the one we started with, we will *not* shrink to that, but +it will get remembered by the runner if it was either novel or better than our +current example. + ~~~~~~~~~~~ Weird Loops ~~~~~~~~~~~ From 6fd018d06760250c02ff9994ae9b255a4dfded40 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Wed, 31 Jan 2018 12:50:06 +0000 Subject: [PATCH 05/17] Remove just and of course --- guides/internals.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/guides/internals.rst b/guides/internals.rst index b9af81a83d..8b88e0c5b7 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -49,8 +49,9 @@ so if we can't make something work without such a hack it's not a big deal. One such example of a hack is the handling of floating point numbers. There are a couple of lexicographic shrinks that are always valid but only really make -sense for our particular encoding of floats. We simply detect when we're working -on something that is of the right size to be a float and apply those transformations. +sense for our particular encoding of floats. We check if we're working +on something that is of the right size to be a float and apply those +transformations regardless of whether it is actually meant to be a float. Worst case scenario it's not a float and they don't work, and we've run a few extra test cases. @@ -171,7 +172,7 @@ example: i += 1 -The more natural way to write this in Python would of course be: +The more natural way to write this in Python would be: .. code-block:: python From e5b0896854c9d32e433a3c1fb94cf6a06c5aedd0 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Wed, 31 Jan 2018 12:50:17 +0000 Subject: [PATCH 06/17] Wrong backticks --- guides/internals.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/guides/internals.rst b/guides/internals.rst index 8b88e0c5b7..82cc87b14b 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -18,7 +18,7 @@ The core engine of Hypothesis is called Conjecture. The "fundamental idea" of Conjecture is that you can represent an arbitrary randomized test case as a string of bytes, which are basically intended as the underlying entropy of some pseudo-random number generator (PRNG). -Whenever you want to do something "random" you just read the next bytes and +Whenever you want to do something "random" you read the next bytes and do what they tell you to do. By manipulating these bytes, we can achieve more interesting effects than pure randomness would allow us to do, while retaining the power and ease of use of random testing. @@ -183,7 +183,7 @@ The more natural way to write this in Python would be: This way of writing the loop would be *entirely wrong*. -Every time `incorporate_new_buffer` succeeds, it changes the shape of the +Every time ``incorporate_new_buffer`` succeeds, it changes the shape of the current shrink target. This consequently changes the shape of intervals, both its particular values and its current length - on each loop iteration the loop might stop either because ``i`` increases or because ``len(self.intervals)`` @@ -227,7 +227,7 @@ passes fail to make any improvements. Search Passes ~~~~~~~~~~~~~ -Search passes are just methods on the ``Shrinker`` class in engine.py. They are +Search passes are methods on the ``Shrinker`` class in engine.py. They are designed to take the current shrink target and try a number of things that might be sensible shrinks of it. @@ -327,7 +327,7 @@ which collates a number of other issues about shrink quality that are good start points for people. The best place to get started thus is to take a look at those linked issues and -just jump in and try things! Find one that you think sounds fun. Note that some +jump in and try things! Find one that you think sounds fun. Note that some of them suggest not doing these as your first foray into the shrinker, as some are harder than others. From f1105b30cd53df3c7521f5e329e7baf65ed300b8 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Wed, 31 Jan 2018 12:53:02 +0000 Subject: [PATCH 07/17] Clarify specificity there --- guides/internals.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/guides/internals.rst b/guides/internals.rst index 82cc87b14b..8c800f7ab5 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -194,6 +194,10 @@ this is that if we successfully deleted the current interval then the interval in position ``i`` has been replaced with something else, which is probably the next thing we would have tried deleting if we hadn't succeeded (or something like it), so we don't want to advance past it. +This is specific to deletion: If we are just replacing the contents of +something then we expect it to still be in the same place, so there we increment +unconditionally. +Examples of this include ``zero_draws`` and ``minimize_individual_blocks``. ------------ The Shrinker From 0d1297fe5fed3e807e97e6c58c2fecb113ff9684 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Fri, 2 Feb 2018 10:20:16 +0000 Subject: [PATCH 08/17] Fix typo --- guides/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/guides/internals.rst b/guides/internals.rst index 8c800f7ab5..db84655420 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -30,7 +30,7 @@ length and among those strings of minimal length is lexicographically (i.e. the normal order on strings - find the first byte at which they differ and use that to decide) smallest. -Ideally we could think of the shrinker is a generic function that takes a +Ideally we could think of the shrinker as a generic function that takes a string satisfying some predicate and returns the shortlex minimal string that also satisfies it. This is wrong on several levels: The first is that we only succeed in approximating From ea42c93abcc7011e8f0441b67b02a36e7b9d6174 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Sun, 11 Feb 2018 12:18:01 +0000 Subject: [PATCH 09/17] Response to review --- guides/internals.rst | 46 +++++++++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/guides/internals.rst b/guides/internals.rst index db84655420..68d17d3b6d 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -16,13 +16,22 @@ Bird's Eye View Concepts The core engine of Hypothesis is called Conjecture. The "fundamental idea" of Conjecture is that you can represent an arbitrary -randomized test case as a string of bytes, which are basically intended as the -underlying entropy of some pseudo-random number generator (PRNG). +randomized test case as the sequence of bytes read from some pseudo-random +number generator (PRNG). Whenever you want to do something "random" you read the next bytes and -do what they tell you to do. By manipulating these bytes, we can achieve +do what they tell you to do. +But these bytes didn't *have* to come from a PRNG, and we can run the test +given any byte sequence we like. By manipulating the choice of bytes, we can achieve more interesting effects than pure randomness would allow us to do, while retaining the power and ease of use of random testing. +The greatest strength of this idea is that we have a single source of truth +for what an example should look like: Every byte sequence is one that *could* +have come from a PRNG, and thus is a valid thing to try for our test. +The only ways it can fail to be a valid test input are for it to be too short +or for it to not satisfy one of the test's preconditions, and both are easily +detectable. + The idea of shrinking in particular is that once we have this representation, we can shrink arbitrary test cases based on it. We try to produce a string that is *shortlex minimal*. What this means is that it has the shortest possible @@ -33,19 +42,18 @@ to decide) smallest. Ideally we could think of the shrinker as a generic function that takes a string satisfying some predicate and returns the shortlex minimal string that also satisfies it. -This is wrong on several levels: The first is that we only succeed in approximating -such a minimal string. The second is that we are only interested in minimizing -things where the predicate goes through the Hypothesis API, which lets us track -a lot of info about how the data is used and use that to guide the process. + +We depart from this ideal in two ways: + +* we can only *approximate* such a minimal string. Finding the actual minimum is + intractable in general. +* we are only interested in minimizing things where the predicate goes through + the Hypothesis API, which lets us track how the data is used and use that to + guide the process. We then use a number of different transformations of the string to try and reduce our input. These vary from principled general transformations to shameless -hacks that special case something we need to work well. We try to aim for mostly -the former, but the nice thing about this model is that the underlying representation -is fully general and we are free to try whatever we want and it will never result -in us doing the wrong thing, so hacks are only a problem to the degree that they -result in messy code and fragile heuristics, they're never a correctness issue, -so if we can't make something work without such a hack it's not a big deal. +hacks that special case something we need to work well. One such example of a hack is the handling of floating point numbers. There are a couple of lexicographic shrinks that are always valid but only really make @@ -181,7 +189,7 @@ The more natural way to write this in Python would be: self.shrink_target.buffer[:u] + self.shrink_target.buffer[v:] ) -This way of writing the loop would be *entirely wrong*. +This is not equivalent in this case, and would exhibit the wrong behaviour. Every time ``incorporate_new_buffer`` succeeds, it changes the shape of the current shrink target. This consequently changes the shape of intervals, both @@ -189,6 +197,12 @@ its particular values and its current length - on each loop iteration the loop might stop either because ``i`` increases or because ``len(self.intervals)`` decreases. +We do not reset ``i`` to zero on success, as this would cause us to retry deleting +things that we have already tried. This *might* work, but is less likely to. +In the event that none of the earlier deletions succeed, this causes us to do +retry the entire prefix uselessly, which can result in a pass taking O(n^2) time +to do O(n) deletions. + An additional quirk is that we only increment ``i`` on failure. The reason for this is that if we successfully deleted the current interval then the interval in position ``i`` has been replaced with something else, which is probably the @@ -204,7 +218,7 @@ The Shrinker ------------ The shrinking part of Hypothesis is organised into a single class called ``Shrinker`` -that lives in engine.py. +that lives in ``engine.py``. Its job is to take an initial ``ConjectureData`` object and some predicate that it satisfies, and to try to produce a simpler ``ConjectureData`` object that @@ -231,7 +245,7 @@ passes fail to make any improvements. Search Passes ~~~~~~~~~~~~~ -Search passes are methods on the ``Shrinker`` class in engine.py. They are +Search passes are methods on the ``Shrinker`` class in ``engine.py``. They are designed to take the current shrink target and try a number of things that might be sensible shrinks of it. From 6d493f52e7ff454010ec53d18a82560b8442d411 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Sun, 11 Feb 2018 12:35:50 +0000 Subject: [PATCH 10/17] Move instructions on running tests to testing guide --- guides/internals.rst | 25 +++---------- guides/testing-hypothesis.rst | 66 +++++++++++++++++++++++++++++++++-- 2 files changed, 67 insertions(+), 24 deletions(-) diff --git a/guides/internals.rst b/guides/internals.rst index 68d17d3b6d..3872ff4922 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -87,8 +87,10 @@ that means we can do things like replacing an integer with a smaller one. Testing ------- -The Hypothesis test suite is rather large, but there are a couple of areas in -particular that are useful to know about when making engine changes. +For general information about how to test Hypothesis, take a look at +the `testing guide `_, but there are a couple +of areas that it's worth specifically highlighting for making changes +to the engine: The first is `tests/cover/test_conjecture_engine.py `_, which is a set of unit tests designed to put the engine into particular scenarios to exercise specific behaviours, @@ -100,25 +102,6 @@ in `tests/quality `_, (for inspiration, *not* implementation). Dan Luu writes about `fuzz testing `_ and `broken processes `_, among other things. + +--------------------------------------- +Setting up a virtualenv to run tests in +--------------------------------------- + +If you want to run individual tests rather than relying on the make tasks +(which you probably will), it's easiest to do this in a virtualenv. + +The following will give you a working virtualenv for running tests in: + +.. code-block:: bash + + pip install virtualenv + python -m virtualenv testing-venv + + # On Windows: testing-venv\Scripts\activate + source testing-venv/bin/activate + + # Can also use pip install -e .[all] to get + # all optional dependencies + pip install -e . + + # Test specific dependencies. + pip install pytest-xdist flaky + +Now whenever you want to run tests you can just activate the virtualenv +using ``source testing-venv/bin/activate`` or ``testing-venv\Scripts\activate`` +and all of the dependencies will be available to you and your local copy +of Hypothesis will be on the path (so any edits will be picked up automatically +and you don't need to reinstall it in the local virtualenv). + +------------- +Running Tests +------------- + +To run a specific test file manually, you can use pytest. I usually use the +following invocation: + +.. code-block:: + + python -m pytest tests/cover/test_conjecture_engine.py + +You will need to have Hypothesis installed locally to run these. I recommend a +virtualenv where you have run ``pip install -e .``, which installs all the +dependencies and puts your ``src`` directory in the path of installed packages +so that edits you make are automatically pipped up. + +Useful arguments you can add to pytest are ``-n 0``, which will disable build +parallelism (I find that on my local laptop the startup time is too high to be +worth it when running single files, so I usually do this), and ``-kfoo`` where +foo is some substring common to the set of tests you want to run (you can also +use composite expressions here. e.g. ``-k'foo and not bar'`` will run anything +containing foo that doesn't also contain bar). From f11c4ff4ece687d28a20c4d905c81fe24e24f7e8 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Sun, 11 Feb 2018 12:36:27 +0000 Subject: [PATCH 11/17] ideally -> usually --- guides/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/guides/internals.rst b/guides/internals.rst index 3872ff4922..bb7e0aff4d 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -269,7 +269,7 @@ have failed to make the input any smaller, we then turn them on. This allows the shrinker to switch from a good but slightly timid mode while its input is large into a more aggressive DELETE ALL THE THINGS mode once that stops -working. By that point ideally we've made our input small enough that quadratic +working. By that point we've usually made our input small enough that quadratic complexity is acceptable. We turn these on once and then they stay on. The reason for this is to avoid a From b859858471c5ac24a5e8f25a25a0b34011a8c358 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Mon, 12 Feb 2018 12:10:51 +0000 Subject: [PATCH 12/17] Improve section on testing --- guides/testing-hypothesis.rst | 69 ++++++++++++++++++++++++++++------- 1 file changed, 55 insertions(+), 14 deletions(-) diff --git a/guides/testing-hypothesis.rst b/guides/testing-hypothesis.rst index 9537f4552c..f0ca2be18d 100644 --- a/guides/testing-hypothesis.rst +++ b/guides/testing-hypothesis.rst @@ -57,7 +57,7 @@ The following will give you a working virtualenv for running tests in: .. code-block:: bash pip install virtualenv - python -m virtualenv testing-venv + python -m virtualenv testing-venv # On Windows: testing-venv\Scripts\activate source testing-venv/bin/activate @@ -79,21 +79,62 @@ and you don't need to reinstall it in the local virtualenv). Running Tests ------------- -To run a specific test file manually, you can use pytest. I usually use the -following invocation: +In order to run tests outside of the make/tox/etc set up, you'll need an +environment where Hypothesis is on the path and all of the testing dependencies +are installed. +We recommend doing this inside a virtualenv as described in the previous section. + +All testing is done using `pytest `_, +with a couple of plugins installed. For advanced usage we recommend reading the +pytest documentation, but this section will give you a primer in enough of the +common commands and arguments to get started. + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Selecting Which Files to Run +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following invocation runs all of the tests in the file +`tests/cover/test_conjecture_engine.py`: .. code-block:: python -m pytest tests/cover/test_conjecture_engine.py -You will need to have Hypothesis installed locally to run these. I recommend a -virtualenv where you have run ``pip install -e .``, which installs all the -dependencies and puts your ``src`` directory in the path of installed packages -so that edits you make are automatically pipped up. - -Useful arguments you can add to pytest are ``-n 0``, which will disable build -parallelism (I find that on my local laptop the startup time is too high to be -worth it when running single files, so I usually do this), and ``-kfoo`` where -foo is some substring common to the set of tests you want to run (you can also -use composite expressions here. e.g. ``-k'foo and not bar'`` will run anything -containing foo that doesn't also contain bar). +If you want to run multiple files you can pass them all as arguments, and if +you pass a directory then it will run all files in that directory. +For example the following runs all the files in `test_conjecture_engine.py` +and `test_slippage.py` + +.. code-block:: + + python -m pytest tests/cover/test_conjecture_engine.py tests/cover/test_slippage.py + +If you were running this in bash (if you're not sure: if you're not on Windows +you probably are) you could also use the syntax: + +.. code-block:: + + python -m pytest tests/cover/test_{conjecture_engine,slippage}.py + +And the following would run all tests under `tests/cover`: + +.. code-block:: + + python -m pytest tests/cover + +~~~~~~~~~~~~~~~~ +Useful Arguments +~~~~~~~~~~~~~~~~ + +Some useful arguments to pytest include: + +* You can pass ``-n 0`` to turn off ``pytest-xdist``'s parallel test execution. + Sometimes for running just a small number of tests its startup time is longer + than the time it saves (this will vary from system to system), so this can + be helpful if you find yourself waiting on test runners to start a lot. +* You can use ``-k`` to select a subset of tests to run. This matches on substrings + of the test names. For example ``-kfoo`` will only run tests that have "foo" as + a substring of their name. You can also use composite expressions here. + e.g. ``-k'foo and not bar'`` will run anything containing foo that doesn't + also contain bar. [More information on how to select tests to run can be found + in the pytest documentation](https://docs.pytest.org/en/latest/usage.html#specifying-tests-selecting-tests). From 0c6f045aaa941df9bb619884e3e492745a0d103c Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Mon, 12 Feb 2018 12:24:38 +0000 Subject: [PATCH 13/17] Add a section about deferring errors --- guides/api-style.rst | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/guides/api-style.rst b/guides/api-style.rst index dcc8c95dab..ac97c74d86 100644 --- a/guides/api-style.rst +++ b/guides/api-style.rst @@ -105,6 +105,49 @@ strategies wherever possible. In particular: ``max_size`` to ``None`` (even if internally it is bounded). +~~~~~~~~~~~~~~~ +Deferred Errors +~~~~~~~~~~~~~~~ + +As far as is reasonable, functions should raise errors when the test is run +(typically by deferring them until you try to draw from the strategy), +not when they are called. +This mostly applies to strategy functions and some error conditions in +``@given`` itself. + +Generally speaking this should be taken care of automatically by use of the +``@defines_strategy`` decorator. + +We do not currently do this for the ``TypeError`` that you will get from +calling the function incorrectly (e.g. with invalid keyword arguments or +missing required arguments). +In principle we could, but it would result in much harder to read function +signatures, so we would be trading off one form of comprehensibility for +another, and so far that hasn't seemed to be worth it. + +The main reasons for preferring this style are: + +* Errors at test import time tend to throw people and be correspondingly hard + for them to debug. + There's an expectation that errors in your test code result in failures in + your tests, and the fact that that test code happens to be defined in a + decorator doesn't seem to change that expectation for people. +* Things like deprecation warnings etc. localize better when they happen + inside the test - test runners will often swallow them or put them in silly + places if they're at import time, but will attach any output that happens + in the test to the test itself. +* There are a lot of cases where raising an error, deprecation warning, etc. + is *only* possible in a test - e.g. if you're using the inline style with + `data `_, + or if you're using + `flatmap `_ + or + `@composite `_ + then the strategy won't actually get evaluated until we run the test, + so that's the only place they can happen. + It's nice to be consistent, and it's weird if sometimes strategy errors result in + definition time errors and sometimes they result in test errors. + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A catalogue of current violations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From d878c41f4783279ad4dc152fbae07292f902cd6a Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Mon, 12 Feb 2018 12:28:59 +0000 Subject: [PATCH 14/17] Monospace engine.py --- guides/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/guides/internals.rst b/guides/internals.rst index bb7e0aff4d..d7879efb78 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -117,7 +117,7 @@ There are a number of cases where we find ourself with a user-provided function (where the "user" might still be something that is entirely our code) and we want to pass a whole bunch of different examples to it in order to achieve some result. Currently this includes each of the main engine, the Shrinker (in -engine.py) and the minimizer, but there are likely to be more in future. +``engine.py``) and the minimizer, but there are likely to be more in future. We typically organise such things in terms of an object that you create with the function and possibly an initial argument that stores these on self and From bb04f5217112bc922e022b673ed2ce097f101095 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Mon, 12 Feb 2018 12:30:01 +0000 Subject: [PATCH 15/17] Slight rewording --- guides/internals.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/guides/internals.rst b/guides/internals.rst index d7879efb78..40db8df7b4 100644 --- a/guides/internals.rst +++ b/guides/internals.rst @@ -16,11 +16,11 @@ Bird's Eye View Concepts The core engine of Hypothesis is called Conjecture. The "fundamental idea" of Conjecture is that you can represent an arbitrary -randomized test case as the sequence of bytes read from some pseudo-random -number generator (PRNG). -Whenever you want to do something "random" you read the next bytes and -do what they tell you to do. -But these bytes didn't *have* to come from a PRNG, and we can run the test +randomized test case as the sequence of bytes read from the pseudo-random +number generator (PRNG) that produced it. +Whenever the test did something "random" it actually read the next bytes and +did what they told it to do. +But those bytes didn't *have* to come from a PRNG, and we can run the test given any byte sequence we like. By manipulating the choice of bytes, we can achieve more interesting effects than pure randomness would allow us to do, while retaining the power and ease of use of random testing. From ef16ee4bf30cbf2dd3da9949263aa36cf045bae7 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Mon, 12 Feb 2018 12:34:15 +0000 Subject: [PATCH 16/17] Switch hyphens to em-dashes --- guides/testing-hypothesis.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/guides/testing-hypothesis.rst b/guides/testing-hypothesis.rst index f0ca2be18d..32159b8b80 100644 --- a/guides/testing-hypothesis.rst +++ b/guides/testing-hypothesis.rst @@ -9,20 +9,20 @@ run its tests and how to to write new ones. General Testing Philosophy -------------------------- -The test suite for Hypothesis is unusually powerful - as you might hope! - +The test suite for Hypothesis is unusually powerful --- as you might hope! --- but the secret is actually more about attitude than technology. The key is that we treat any bug in Hypothesis as a bug in our test suite -too - and think about the kinds of bugs that might not be caught, then write +too --- and think about the kinds of bugs that might not be caught, then write tests that would catch them. We also use a variety of tools to check our code automatically. This includes formatting, import order, linting, and doctests (so examples in docs don't -break). All of this is checked in CI - which means that once the build is +break). All of this is checked in CI --- which means that once the build is green, humans can all focus on meaningful review rather than nitpicking operator spacing. -Similarly, we require all code to have tests with 100% branch coverage - as +Similarly, we require all code to have tests with 100% branch coverage --- as a starting point, not the final goal. - Requiring full coverage can't guarantee that we've written all the tests @@ -30,7 +30,7 @@ a starting point, not the final goal. result), but less than full coverage guarantees that there's some code we're not testing at all. - Tests beyond full coverage generally aim to demonstrate that a particular - feature works, or that some subtle failure case is not present - often + feature works, or that some subtle failure case is not present --- often because when it was found and fixed, someone wrote a test to make sure it couldn't come back! From ba83e8f0355dd582b6b9bcb1b23d5af872901a00 Mon Sep 17 00:00:00 2001 From: "David R. MacIver" Date: Mon, 12 Feb 2018 12:36:25 +0000 Subject: [PATCH 17/17] Revert "Switch hyphens to em-dashes" This reverts commit ef16ee4bf30cbf2dd3da9949263aa36cf045bae7. --- guides/testing-hypothesis.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/guides/testing-hypothesis.rst b/guides/testing-hypothesis.rst index 32159b8b80..f0ca2be18d 100644 --- a/guides/testing-hypothesis.rst +++ b/guides/testing-hypothesis.rst @@ -9,20 +9,20 @@ run its tests and how to to write new ones. General Testing Philosophy -------------------------- -The test suite for Hypothesis is unusually powerful --- as you might hope! --- +The test suite for Hypothesis is unusually powerful - as you might hope! - but the secret is actually more about attitude than technology. The key is that we treat any bug in Hypothesis as a bug in our test suite -too --- and think about the kinds of bugs that might not be caught, then write +too - and think about the kinds of bugs that might not be caught, then write tests that would catch them. We also use a variety of tools to check our code automatically. This includes formatting, import order, linting, and doctests (so examples in docs don't -break). All of this is checked in CI --- which means that once the build is +break). All of this is checked in CI - which means that once the build is green, humans can all focus on meaningful review rather than nitpicking operator spacing. -Similarly, we require all code to have tests with 100% branch coverage --- as +Similarly, we require all code to have tests with 100% branch coverage - as a starting point, not the final goal. - Requiring full coverage can't guarantee that we've written all the tests @@ -30,7 +30,7 @@ a starting point, not the final goal. result), but less than full coverage guarantees that there's some code we're not testing at all. - Tests beyond full coverage generally aim to demonstrate that a particular - feature works, or that some subtle failure case is not present --- often + feature works, or that some subtle failure case is not present - often because when it was found and fixed, someone wrote a test to make sure it couldn't come back!