Stateful tests with Dataset #8658

dcherian · 2024-01-24T16:34:59Z

I was curious to see if the hypothesis stateful testing would catch an inconsistent sequence of index manipulation operations like #8646. Turns out rename_vars is basically broken? (filed #8659) :P

PS: this blog post is amazing.

E           state = DatasetStateMachine()
E           state.assert_invariants()
E           > ===
E
E            <xarray.Dataset>
E           Dimensions:  ()
E           Data variables:
E               *empty*
E           ===
E
E
E           > vars: ('1', '1_')
E           state.add_dim_coord(var=<xarray.Variable (1: 1)>
E           array([0], dtype=uint32))
E           state.assert_invariants()
E           > ===
E
E            <xarray.Dataset>
E           Dimensions:  (1: 1)
E           Coordinates:
E             * 1        (1) uint32 0
E           Data variables:
E               1_       (1) uint32 0
E           ===
E
E
E           > renaming 1 to 0
E           state.rename_vars(newname='0')
E           state.assert_invariants()
E           > ===
E
E            <xarray.Dataset>
E           Dimensions:  (1: 1)
E           Coordinates:
E             * 0        (1) uint32 0
E           Dimensions without coordinates: 1
E           Data variables:
E               1_       (1) uint32 0
E           ===
E
E
E           state.teardown()

TomNicholas · 2024-01-24T16:57:11Z

Oh this is a really cool idea!

Also yeah I've read that blog post before and my mind was also blown haha

max-sixty · 2024-01-24T19:16:35Z

Great idea! Really cool.

I even wonder to what extent this could replace vast swathes of our tests — anything where the output is calculated from the input.

We'll always want some unit tests to code to when developing, but plausibly the unit tests are either a) examples or b) some saved cases that have failed these model-based tests before.

benbovy · 2024-01-25T14:43:53Z

That looks very cool indeed! (although I don't think that rename_vars is broken :) but instead some valid cases require skipping the default indexes invariant check).

dcherian · 2024-01-25T16:58:18Z

instead some valid cases require skipping the default indexes invariant check.

Can we write down rules for when these checks are needed?

benbovy · 2024-01-26T08:04:17Z

Yes although I think it is easier to write rules for when these checks aren't needed.

One example: skip the default indexes invariant test when the name of an existing dimension coordinate is passed as input kwarg or dict key to .rename_vars().

* main: (31 commits) correctly encode/decode _FillValues/missing_values/dtypes for packed data (pydata#8713) Expand use of `.oindex` and `.vindex` (pydata#8790) Return a dataclass from Grouper.factorize (pydata#8777) [skip-ci] Fix upstream-dev env (pydata#8839) Add dask-expr for windows envs (pydata#8837) [skip-ci] Add dask-expr dependency to doc.yml (pydata#8835) Add `dask-expr` to environment-3.12.yml (pydata#8827) Make list_chunkmanagers more resilient to broken entrypoints (pydata#8736) Do not attempt to broadcast when global option ``arithmetic_broadcast=False`` (pydata#8784) try to get the `upstream-dev` CI to complete again (pydata#8823) Bump the actions group with 1 update (pydata#8818) Update documentation for clarity (pydata#8817) DOC: link to zarr.convenience.consolidate_metadata (pydata#8816) Refactor Grouper objects (pydata#8776) Grouper object design doc (pydata#8510) Bump the actions group with 2 updates (pydata#8804) tokenize() should ignore difference between None and {} attrs (pydata#8797) fix: remove Coordinate from __all__ in xarray/__init__.py (pydata#8791) Fix non-nanosecond casting behavior for `expand_dims` (pydata#8782) Migrate treenode module. (pydata#8757) ...

This reverts commit 6a38e27.

dcherian · 2024-04-02T01:49:38Z

Hah now our strategy tests are failing

* main: (26 commits) [pre-commit.ci] pre-commit autoupdate (pydata#8900) Bump the actions group with 1 update (pydata#8896) New empty whatsnew entry (pydata#8899) Update reference to 'Weighted quantile estimators' (pydata#8898) 2024.03.0: Add whats-new (pydata#8891) Add typing to test_groupby.py (pydata#8890) Avoid in-place multiplication of a large value to an array with small integer dtype (pydata#8867) Check for aligned chunks when writing to existing variables (pydata#8459) Add dt.date to plottable types (pydata#8873) Optimize writes to existing Zarr stores. (pydata#8875) Allow multidimensional variable with same name as dim when constructing dataset via coords (pydata#8886) Don't allow overwriting indexes with region writes (pydata#8877) Migrate datatree.py module into xarray.core. (pydata#8789) warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (pydata#8874) groupby: Dispatch quantile to flox. (pydata#8720) Opt out of auto creating index variables (pydata#8711) Update docs on view / copies (pydata#8744) Handle .oindex and .vindex for the PandasMultiIndexingAdapter and PandasIndexingAdapter (pydata#8869) numpy 2.0 copy-keyword and trapz vs trapezoid (pydata#8865) upstream-dev CI: Fix interp and cumtrapz (pydata#8861) ...

Zac-HD

Obviously there are some remaining todo comments, but this looks great! I'd consider merging it more or less as-is, and then continuing work in a follow-up PR 🙂

Zac-HD · 2024-04-02T04:36:05Z

properties/test_index_manipulation.py

+    # Can't use bundles because we'd need pre-conditions on consumes(bundle)
+    # indexed_dims = Bundle("indexed_dims")
+    # multi_indexed_dims = Bundle("multi_indexed_dims")


I think HypothesisWorks/hypothesis#3944 will enable this, though I wouldn't delay this PR to wait.

Zac-HD · 2024-04-02T04:38:18Z

properties/test_index_manipulation.py

+    import xarray as xr
+
+    ds = xr.Dataset()
+    ds["0"] = np.array(["", "\x000"], dtype=object)


It doesn't seem that surprising that you hit bugs related to empty-string or null-prefixed string? To most C code, those are identical!

xarray/testing/strategies.py

dcherian · 2024-04-03T21:06:14Z

Thanks for the review @Zac-HD

dcherian mentioned this pull request Jan 24, 2024

renaming index variables with rename_vars seems buggy #8659

Closed

TomNicholas added topic-hypothesis Strategies or tests using the hypothesis library topic-testing labels Jan 24, 2024

dcherian added 22 commits March 9, 2024 10:25

Stateful tests with Dataset

aa5653f

Disable check_default_indexes when needed

f1199a1

Add Zarr roundtrip

3fdb188

Randomize dimension choice

0820323

Fix a bug

06bdbf8

Add reset_index

2710a4e

Add stack, unstack

85ab186

[revert] Disable Zarr till we control attrs strategy

443916c

Try making unique names

5c00585

Share names strategy to ensure uniques?

04b6b92

cleanup

a4a4c43

Try sharing strategies better

83fa17b

Fix endianness

491b9b1

Better swap_dims

c648cfd

More improvements

06763c2

WIP

c07688c

Drop duplicates before unstacking

e30a89f

Add reset_index

316eb43

Better duplicate assumption

88e2010

Move

6c23b49

Fix reset_index

2c671e3

dcherian added 6 commits March 31, 2024 20:01

Add notes

d8e00d3

Skip timedelta indexes

06808a1

to_zarr

cf9e86a

small tweaks

e5333fa

Remove NaT assume

b85f5e8

Revert "[revert]"

15e00ff

This reverts commit 6a38e27.

dcherian force-pushed the state-machine branch 2 times, most recently from 7d796ee to 0f76396 Compare April 2, 2024 00:01

Add hypothesis workflow

082d9f6

dcherian force-pushed the state-machine branch from 0f76396 to 082d9f6 Compare April 2, 2024 00:04

dcherian added the run-slow-hypothesis Run slow hypothesis tests label Apr 2, 2024

Swtich out

b9433b6

dcherian force-pushed the state-machine branch 2 times, most recently from 63d3c47 to b9433b6 Compare April 2, 2024 00:10

fix

a216531

dcherian marked this pull request as ready for review April 2, 2024 01:49

dcherian added 4 commits April 1, 2024 20:05

Use st.builds

d2af875

cleanup

5aae971

Add initialize

7d8b6ff

dcherian requested a review from TomNicholas April 2, 2024 03:29

Zac-HD approved these changes Apr 2, 2024

View reviewed changes

dcherian and others added 2 commits April 3, 2024 15:05

review feedback

926bf54

Merge branch 'main' into state-machine

7cbfc9c

dcherian enabled auto-merge (squash) April 3, 2024 21:09

dcherian disabled auto-merge April 3, 2024 21:29

dcherian merged commit 40afd30 into pydata:main Apr 3, 2024
30 of 31 checks passed

dcherian deleted the state-machine branch April 3, 2024 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateful tests with Dataset #8658

Stateful tests with Dataset #8658

dcherian commented Jan 24, 2024 •

edited

Loading

TomNicholas commented Jan 24, 2024

max-sixty commented Jan 24, 2024

benbovy commented Jan 25, 2024

dcherian commented Jan 25, 2024

benbovy commented Jan 26, 2024

dcherian commented Apr 2, 2024

Zac-HD left a comment

Zac-HD Apr 2, 2024

Zac-HD Apr 2, 2024

dcherian commented Apr 3, 2024

Stateful tests with Dataset #8658

Stateful tests with Dataset #8658

Conversation

dcherian commented Jan 24, 2024 • edited Loading

TomNicholas commented Jan 24, 2024

max-sixty commented Jan 24, 2024

benbovy commented Jan 25, 2024

dcherian commented Jan 25, 2024

benbovy commented Jan 26, 2024

dcherian commented Apr 2, 2024

Zac-HD left a comment

Choose a reason for hiding this comment

Zac-HD Apr 2, 2024

Choose a reason for hiding this comment

Zac-HD Apr 2, 2024

Choose a reason for hiding this comment

dcherian commented Apr 3, 2024

dcherian commented Jan 24, 2024 •

edited

Loading