Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide base test suite for implementators #650

Open
TomAugspurger opened this issue Jun 1, 2021 · 6 comments
Open

Provide base test suite for implementators #650

TomAugspurger opened this issue Jun 1, 2021 · 6 comments

Comments

@TomAugspurger
Copy link
Collaborator

I think the fsspec ecosystem would benefit from a set of "base" tests defined in fsspec that can be used by downstream projects like s3fs and adlfs. This would help iron out the inconsistencies between these libraries, especially around behaviors with trailing /, exceptions, etc.

Pandas has used this with extension types, which are expected to implement the extension array interface. See https://pandas.pydata.org/docs/development/extending.html#testing-extension-arrays for more, but the basic idea is to use a form of property-based testing. fsspec defines the expected behavior as a test, in a way that only uses the public fsspec interface. Then implementors would define pytest fixtures to provide the filesystem with some known properties (e.g. certain files exist).

@pytest.fixture
def fs():
    """Fixture defining the filesystem with these files:"""
    ...

class TestFilesystem:
    def test_ls_raises_filenotfound(self, fs):
        with pytest.raises(FileNotFoundError):
            fs.ls("/not/a/key")

https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/conftest.py and https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/base/__init__.py provide some examples.

@martindurant
Copy link
Member

Sounds very audible, but quite a lot of work! For sure, it would probably lead to the eventual solution of things like the path suffix issue; but there are some concrete differences between implementations. The most obvious of these is read-only versus rw.

@TomAugspurger
Copy link
Collaborator Author

I don't think it's too much additional work. At least not on the margin once things are set up. For example, the test in #649 could be adapted to a base test pretty easily. And in theory this lowers the burden on downstream libraries, if they can confidently rely on the base tests then they can skip writing some tests in their package.

There is some upfront work in defining the fixtures and base tests. If your open to it, I can do a small version, with just the ls raises test from that PR along with the scaffolding.

but there are some concrete differences between implementations

Yep, in pandas we recommend extension arrays skip those tests that don't apply Either by not inheriting from that base class, or by skipping: https://github.com/pandas-dev/pandas/blob/9231b495aadac6e93a0385c24066d84a1fb9eac6/pandas/tests/extension/test_numpy.py#L211-L214

Your read-only vs. read-write example is perfect. There'd be something like BaseReadOnlyTests, BaseReadTests and BaseWriteTests.

@isidentical
Copy link
Member

We are in the progress of writing multiple different backends (webdav, oss, ssh) for DVC and this would be really beneficial for us as well!

@martindurant
Copy link
Member

Mentioned in #600

TomAugspurger pushed a commit to TomAugspurger/filesystem_spec that referenced this issue Jun 1, 2021
This starts a suite of base tests to verify that downstream libraries
correctly implement the spec.

xref fsspec#650
@jorisvandenbossche
Copy link
Contributor

I think a set of base tests would be very useful (we regularly run into inconsistencies between implementations, also for basic operations like expected behaviour regarding certain keywords like exists_ok, which should be relatively straightforward to test in a generic test, I think)

Your read-only vs. read-write example is perfect. There'd be something like BaseReadOnlyTests, BaseReadTests and BaseWriteTests.

An alternative could be to use a fixture with some "parametrization" that is returned (and can be checked by the test), like readonly=False, supports_async=True, ....
That could be a bit more verbose in writing the test, but also gives more flexibility (a test doesn't need to fit into one "subset" as a class).

In pyarrow we have shared base tests for our different filesystem implementations, and there those are currently parametrized with eg allow_copy_file=True, allow_move_dir=False, allow_append_to_file=False.
See https://github.com/apache/arrow/blob/4d19225d57bfc3303758a9547995ac70faccc552/python/pyarrow/tests/test_fs.py#L203-L211

@jorisvandenbossche
Copy link
Contributor

Converting some existing tests could also be a good starting point to add more tests to such a base test suite. I actually started to parameterize the local tests while trying to implement a new filesystem (a new HDFS filesystem, see https://github.com/intake/filesystem_spec/compare/master...jorisvandenbossche:pyarrow-wrapper?expand=1, xref #663)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants