-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save failing cases between test runs #20
Comments
Although I think storing failing test cases would be a great fit for StreamData (making sure you don’t “lose” failing cases), I don’t think this project is the place to add such a feature. Since ExUnit’s seeds are used to generate the test data, we wouldn’t have to store whole test cases. Instead, PS: Since I love a good challenge, I tinkered with this idea for a bit and came up with bad_seed. It's the simplest thing that works; it stores the last failing seed in |
I don't think storing the seeds would not be enough because over time the
test generators could change and the seed would nolonger produce same data.
…On 27 Aug 2017 12:53 pm, "Jeff Kreeftmeijer" ***@***.***> wrote:
Although I think storing failing test cases would be a great fit for
StreamData (making sure you don’t “lose” failing cases), I don’t think this
project is the place to add such a feature.
Since ExUnit’s seeds are used to generate the test data, we wouldn’t have
to store whole test cases. Instead, as @tmbb <https://github.com/tmbb>
already mentioned, we should be fine just storing ExUnit's seed. So, this
feature request might be better suited for ExUnit itself. However, since
this is a fairly specific (and possibly confusing) feature, this might be
better as a separate library for now.
PS: Since I love a good challenge, I tinkered with this idea for a bit and
came up with bad_seed <https://github.com/jeffkreeftmeijer/bad_seed>.
It's the simplest thing that works; it stores the last failing seed in
test/.bad_seed and keeps using it until it's green.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#20 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB6JTST8Do_gkbM2UYNGx6XT2S5djexBks5scck_gaJpZM4O1rbl>
.
|
This is NOT what I said! Saving the random seed is useful, but it limits you to running the whole test suite. When I talked about saving the seed I had in mind taking the state of the random number generator at the time where the failing example was tested and saving that. Later, you'd only rerun that test case and then try some new data. Your library is interesting, but it's not the same thing. |
@fishcakez That's true, but if the generators change, then you actually have very few guarantees... You'd have to serialize the values themselves and not the seed, and I don't know if that's possible for all values. Saving the seed (as I've described above) seems like a good compromise. |
Bad seed is definitely interesting and in case we can't figure out exactly how to do the saving and loading, it is the minimum we can do to get started. @jeffkreeftmeijer you could save the bad seed in "_build/test" something. |
@tmbb I agree that storing the seeds per example is definitely a nicer solution, and I wasn’t suggesting my library does the same thing. It's just a quick stab at the problem. @josevalim better idea indeed. Will look into that. |
@tmbb my main interest is in state machine testing, which I think only appears in the Erlang quickcheck libraries. In those situations it is likely that generators would change over time but the input would be tested for validity as part of a precondition check when running the test. In this situation saving the seed is actually unhelpful because it is not providing what it is intended to, and the regression test desired won't be run. I think this comes into play generally, that while interesting it has edge cases that mean the same test isnt going to be run unless you can guarantee all the test code, or at least generation is the same. |
@jeffkreeftmeijer Ok, I just wanted to make sure I was communicating my idea correctly :) @josevalim If I'm reading the code correctly, this function is probably a good place to add some code that saves the seed, test config, size, etc (whatever you need to make the test reproducible; I'm not familiar enough with the code to know exactly what information is needed). EDIT: You still need some extra code to run that specific example. |
@fishcakez I'm not familiar with state machine testing, but wouldn't you be able to generate the state transitions from the random seed? |
Unfortunately not if the generators change. The tests work by generating a list of commands to run against the state machine. Before running the test the list of commands is checked to be a valid set of commands using a model. If the commands are valid it is run against the system. Its likely that you the generators will change over time but you will want to run the same regression tests of commands that used to fail. For example when a feature is added you would likely need to extend your model, and so the generator for the list of commands would change. Therefore the seed would generate a different list of commands to the one you intended to test. When replaying a previous list of commands the precondition check is still carried out, so its known if the regression test is still valid in the model. If the generator doesn't change then it will still produce the same list of commands with the same seed. I was trying to give a real life example, where storing the seed would not work. However I think in simpler cases the same occurs as soon as the generator changes. Therefore I think we would only want to keep the seeds around if the generator does not change. However then if the generator does change you still want to be able to run the regression test. If we keep the seed we end up testing the wrong thing, and if we delete the seed then we lose the regression test. I think this means that we would need to always store the generated term and not seed. |
@fishcakez Yes, I think you're right. In that case you'll be rewriting the generator. I wonder if the hypothesis bytestring approach (from Python) could help you here... Probably not, unless you're careful when writing your generators (I can think of some possibilities). But their advice is that if the regression is important or hard to find you should save the term manually. Translating into StreamData, you should gather the interesting examples in a normal ExUnit test case. |
@tmbb we are able to serialize any pure datastructure to disk, the impure ones too but it wont reproduce the old side effects if that makes a difference. I am not just concerned about the long term important ones but short term testing to. If we store the seed the user can easily end up testing the wrong input unless we can know the generators didn't change. If we can't provide this guarantee then I don't think we should provide the feature (storing seed) as users will find it doesn't work as intended - or even worse they won't discover it when it occurs! |
Just to be clear, I was referring to StreamData providing this feature, not another library that isn't specifically targeting StreamData. |
After speaking to @josevalim, we could get the best of both by reusing test seeds when running mix --stale. This would mean ppl can run tests with same seed until they pass, then they may get a new seed when it comes back to being tested again. If the generator changes then it's fine because still generate new values. This works nicely because the seed will not last through different builds but will work when really you want it to. Given the speed of property tests most users would be wanting to use stale anyway, if they aren't in general. |
I just saw this issue today, so my comments are perhaps a little bit late. I just implemented the same feature for To execute the counter example within StreamData, I the generators of the property must not be executed, this could be communicated towards And just for the record: @fishcakez PropEr and prop_check support state machine testing as well. |
Just to go back to this issue of storing bad examples. In practice, erlang terms can often be serializeable (even functions). I now think that if this is ever going to be implemented, both the seed and value should be saved when possible. |
Hypothesis (the inspiration for all my feature requests), saves the failing test cases in a database, so that they can be tried in other test runs. That way you're confident that the generated example that failed before will be testes in the next builds. Link: http://hypothesis.readthedocs.io/en/latest/database.html
I think we should do this too. The database is not very complex, it's just some files in hidden directories (I'm not familiar with the implementation details). Just to be clear, I'm not talking about repeating the whole test suite for the failing tests, just repeating the failing examples. This probably requires storing the random seed just before running a new test. Can StreamData do it?
The text was updated successfully, but these errors were encountered: