-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stateful tests with Dataset #8658
Conversation
Oh this is a really cool idea! Also yeah I've read that blog post before and my mind was also blown haha |
Great idea! Really cool. I even wonder to what extent this could replace vast swathes of our tests — anything where the output is calculated from the input. We'll always want some unit tests to code to when developing, but plausibly the unit tests are either a) examples or b) some saved cases that have failed these model-based tests before. |
That looks very cool indeed! (although I don't think that |
Can we write down rules for when these checks are needed? |
Yes although I think it is easier to write rules for when these checks aren't needed. One example: skip the default indexes invariant test when the name of an existing dimension coordinate is passed as input kwarg or dict key to |
* main: (31 commits) correctly encode/decode _FillValues/missing_values/dtypes for packed data (pydata#8713) Expand use of `.oindex` and `.vindex` (pydata#8790) Return a dataclass from Grouper.factorize (pydata#8777) [skip-ci] Fix upstream-dev env (pydata#8839) Add dask-expr for windows envs (pydata#8837) [skip-ci] Add dask-expr dependency to doc.yml (pydata#8835) Add `dask-expr` to environment-3.12.yml (pydata#8827) Make list_chunkmanagers more resilient to broken entrypoints (pydata#8736) Do not attempt to broadcast when global option ``arithmetic_broadcast=False`` (pydata#8784) try to get the `upstream-dev` CI to complete again (pydata#8823) Bump the actions group with 1 update (pydata#8818) Update documentation for clarity (pydata#8817) DOC: link to zarr.convenience.consolidate_metadata (pydata#8816) Refactor Grouper objects (pydata#8776) Grouper object design doc (pydata#8510) Bump the actions group with 2 updates (pydata#8804) tokenize() should ignore difference between None and {} attrs (pydata#8797) fix: remove Coordinate from __all__ in xarray/__init__.py (pydata#8791) Fix non-nanosecond casting behavior for `expand_dims` (pydata#8782) Migrate treenode module. (pydata#8757) ...
7d796ee
to
0f76396
Compare
63d3c47
to
b9433b6
Compare
Hah now our strategy tests are failing |
* main: (26 commits) [pre-commit.ci] pre-commit autoupdate (pydata#8900) Bump the actions group with 1 update (pydata#8896) New empty whatsnew entry (pydata#8899) Update reference to 'Weighted quantile estimators' (pydata#8898) 2024.03.0: Add whats-new (pydata#8891) Add typing to test_groupby.py (pydata#8890) Avoid in-place multiplication of a large value to an array with small integer dtype (pydata#8867) Check for aligned chunks when writing to existing variables (pydata#8459) Add dt.date to plottable types (pydata#8873) Optimize writes to existing Zarr stores. (pydata#8875) Allow multidimensional variable with same name as dim when constructing dataset via coords (pydata#8886) Don't allow overwriting indexes with region writes (pydata#8877) Migrate datatree.py module into xarray.core. (pydata#8789) warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (pydata#8874) groupby: Dispatch quantile to flox. (pydata#8720) Opt out of auto creating index variables (pydata#8711) Update docs on view / copies (pydata#8744) Handle .oindex and .vindex for the PandasMultiIndexingAdapter and PandasIndexingAdapter (pydata#8869) numpy 2.0 copy-keyword and trapz vs trapezoid (pydata#8865) upstream-dev CI: Fix interp and cumtrapz (pydata#8861) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously there are some remaining todo comments, but this looks great! I'd consider merging it more or less as-is, and then continuing work in a follow-up PR 🙂
# Can't use bundles because we'd need pre-conditions on consumes(bundle) | ||
# indexed_dims = Bundle("indexed_dims") | ||
# multi_indexed_dims = Bundle("multi_indexed_dims") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think HypothesisWorks/hypothesis#3944 will enable this, though I wouldn't delay this PR to wait.
import xarray as xr | ||
|
||
ds = xr.Dataset() | ||
ds["0"] = np.array(["", "\x000"], dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem that surprising that you hit bugs related to empty-string or null-prefixed string? To most C code, those are identical!
Thanks for the review @Zac-HD |
I was curious to see if the hypothesis stateful testing would catch an inconsistent sequence of index manipulation operations like #8646. Turns out
rename_vars
is basically broken? (filed #8659) :PPS: this blog post is amazing.