Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include pytest to environment.yml #53

Merged
merged 1 commit into from
Mar 1, 2021
Merged

Conversation

tommylees112
Copy link
Contributor

You might not want to include pytest for the end user, but even so it can be a nice way for people to check that everything is working? Also for CI you will need to have pytest in your environment folders for the remote machine to run your tests. Maybe premature ?

You might not want to include pytest for the end user, but even so it can be a nice way for people to check that everything is working? Also for CI you will need to have pytest in your environment folders for the remote machine to run your tests. Maybe premature ?
@jejjohnson
Copy link
Member

Thank you @tommylees112 ! Definitely not too soon! We haven't done too much testing (regrettably) outside of small data prep things. Probably during the next refactor, we'll start adding more.

Any suggestions on some tests? Shapes, toy data, types, and that sort of thing?

@jejjohnson jejjohnson merged commit 0c82aae into main Mar 1, 2021
@tommylees112
Copy link
Contributor Author

First thing to say is that you probably don't need to write much more code. The notebooks contain the test principles, it's just an automated way for you to run them before you push changes to master etc.

My main motivations for writing tests is that they allows me to catch silent errors, quickly identify what has gone wrong and to jump into arbitrary points in the pipeline by putting in an assert False statement and using pdb (might not be best practice but i find it's fast). e.g.

pytest --pdb .

Key things to test:

  • Shapes of input and output datasets match expectations
  • Missing values (they need to be found and dealt with)
  • Value Ranges are within expected bounds (e.g. probability 0-1)
  • Toy datasets that are very small (and allow the tests to run quickly) are really important to pass through the pipeline.
  • In the past I have written assertions that the model is "learning", i.e. losses fall after 1 or 2 epochs (but i'm not sure this is best practice because of the random potential for this not to be the case with SGD. I mean you expect it but it's not guaranteed)

Things to remember:

  • Deterministically seed the random number generator

Nice to haves:

  • I think typing is great from a documentation tool, it allows a user to know whether a function is going to have a Tensor or a List or a Dict etc.. Sometimes getting mypy to play nicely and not give any errors is a bit of a faff so i would say it is more of a general principle than necessarily having mypy checks all passing. Ultimately however that is what you want.

Note:

  • I haven't used the hypothesis library myself but looking at it now it seems to offer the potential to test with arbitrarily generated data defined by a specific type (e.g. if a function takes a tuple of a boolean and a float)

@gonzmg88 gonzmg88 deleted the tommylees112-patch-1 branch September 22, 2022 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants