Provide base test suite for implementators #650

TomAugspurger · 2021-06-01T13:38:53Z

I think the fsspec ecosystem would benefit from a set of "base" tests defined in fsspec that can be used by downstream projects like s3fs and adlfs. This would help iron out the inconsistencies between these libraries, especially around behaviors with trailing /, exceptions, etc.

Pandas has used this with extension types, which are expected to implement the extension array interface. See https://pandas.pydata.org/docs/development/extending.html#testing-extension-arrays for more, but the basic idea is to use a form of property-based testing. fsspec defines the expected behavior as a test, in a way that only uses the public fsspec interface. Then implementors would define pytest fixtures to provide the filesystem with some known properties (e.g. certain files exist).

@pytest.fixture
def fs():
    """Fixture defining the filesystem with these files:"""
    ...

class TestFilesystem:
    def test_ls_raises_filenotfound(self, fs):
        with pytest.raises(FileNotFoundError):
            fs.ls("/not/a/key")

https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/conftest.py and https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/base/__init__.py provide some examples.

The text was updated successfully, but these errors were encountered:

martindurant · 2021-06-01T13:42:43Z

Sounds very audible, but quite a lot of work! For sure, it would probably lead to the eventual solution of things like the path suffix issue; but there are some concrete differences between implementations. The most obvious of these is read-only versus rw.

TomAugspurger · 2021-06-01T13:53:01Z

I don't think it's too much additional work. At least not on the margin once things are set up. For example, the test in #649 could be adapted to a base test pretty easily. And in theory this lowers the burden on downstream libraries, if they can confidently rely on the base tests then they can skip writing some tests in their package.

There is some upfront work in defining the fixtures and base tests. If your open to it, I can do a small version, with just the ls raises test from that PR along with the scaffolding.

but there are some concrete differences between implementations

Yep, in pandas we recommend extension arrays skip those tests that don't apply Either by not inheriting from that base class, or by skipping: https://github.com/pandas-dev/pandas/blob/9231b495aadac6e93a0385c24066d84a1fb9eac6/pandas/tests/extension/test_numpy.py#L211-L214

Your read-only vs. read-write example is perfect. There'd be something like BaseReadOnlyTests, BaseReadTests and BaseWriteTests.

isidentical · 2021-06-01T14:00:40Z

We are in the progress of writing multiple different backends (webdav, oss, ssh) for DVC and this would be really beneficial for us as well!

martindurant · 2021-06-01T14:50:27Z

Mentioned in #600

This starts a suite of base tests to verify that downstream libraries correctly implement the spec. xref fsspec#650

jorisvandenbossche · 2021-06-16T12:13:05Z

I think a set of base tests would be very useful (we regularly run into inconsistencies between implementations, also for basic operations like expected behaviour regarding certain keywords like exists_ok, which should be relatively straightforward to test in a generic test, I think)

Your read-only vs. read-write example is perfect. There'd be something like BaseReadOnlyTests, BaseReadTests and BaseWriteTests.

An alternative could be to use a fixture with some "parametrization" that is returned (and can be checked by the test), like readonly=False, supports_async=True, ....
That could be a bit more verbose in writing the test, but also gives more flexibility (a test doesn't need to fit into one "subset" as a class).

In pyarrow we have shared base tests for our different filesystem implementations, and there those are currently parametrized with eg allow_copy_file=True, allow_move_dir=False, allow_append_to_file=False.
See https://github.com/apache/arrow/blob/4d19225d57bfc3303758a9547995ac70faccc552/python/pyarrow/tests/test_fs.py#L203-L211

jorisvandenbossche · 2021-06-16T12:16:51Z

Converting some existing tests could also be a good starting point to add more tests to such a base test suite. I actually started to parameterize the local tests while trying to implement a new filesystem (a new HDFS filesystem, see https://github.com/intake/filesystem_spec/compare/master...jorisvandenbossche:pyarrow-wrapper?expand=1, xref #663)

TomAugspurger pushed a commit to TomAugspurger/filesystem_spec that referenced this issue Jun 1, 2021

POC for base tests

f818145

This starts a suite of base tests to verify that downstream libraries correctly implement the spec. xref fsspec#650

TomAugspurger mentioned this issue Jun 1, 2021

POC of base tests for fsspec behavior. #651

Open

brl0 mentioned this issue Aug 23, 2021

generic top-level file-system #732

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide base test suite for implementators #650

Provide base test suite for implementators #650

TomAugspurger commented Jun 1, 2021

martindurant commented Jun 1, 2021

TomAugspurger commented Jun 1, 2021

isidentical commented Jun 1, 2021

martindurant commented Jun 1, 2021

jorisvandenbossche commented Jun 16, 2021

jorisvandenbossche commented Jun 16, 2021

Provide base test suite for implementators #650

Provide base test suite for implementators #650

Comments

TomAugspurger commented Jun 1, 2021

martindurant commented Jun 1, 2021

TomAugspurger commented Jun 1, 2021

isidentical commented Jun 1, 2021

martindurant commented Jun 1, 2021

jorisvandenbossche commented Jun 16, 2021

jorisvandenbossche commented Jun 16, 2021