-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic working FsspecStore #1785
Conversation
Hello @martindurant! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-04-14 01:40:25 UTC |
thanks for this @martindurant, I will see if I have time to play with this locally.
I don't think we have expectations at this point. I will try to get a distillation from the v2 issues around this topic. What do you think we should do here? |
I see we have no storage tests, so don't know how to push this any further. |
I will get some tests for you shortly |
@martindurant - the test suite is in a better place now. Are you up for picking this up? |
For testing ... I could set up a mock s3 or gcs with some pain and CI overhead. Also, I could wrap the fsspec memoryFS in async stuff, but that would not really be testing the async-ness and depends no me doing that correctly. None of the other stores are actually async, right? |
@martindurant - which fsspec implementations have an async-api available? For now, a mocked s3 backend seems like the way to go. |
I think this is a complete list
The following can pass through async, if they wrap an async FS:
|
Got it. Makes sense. For now, let's target a mocked test against s3. We can add a few others as we approach a full release. |
Well, I would use the moto server as s3fs does, which is a real S3 implementation with most of the functionality of the real one (minus permissions, object lifetimes and other unimportant things). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant -- this is coming together. I took it for a spin today and ended up needing to make a few changes (suggestions below). But it works!
I merged from v3 and the suggestions above. I cannot get mypy to play. By the way, I would suggest that memoryview() and bytes() really ought to work as expected for a Buffer. |
@martindurant -- thanks for pushing this forward...
I'm hoping @dstansby can take a look... we can find a way!
cc @madsbk, @akshaysubr, @normanrz |
Co-authored-by: Martin Durant <[email protected]>
Mdurant v3 fsspec
Co-authored-by: Davis Bennett <[email protected]>
OK, so the problem with the tests, is that they call get/set in blocking sync code, but the store implementation works in async mode. The test instance is created in sync mode, and gets put on a dedicated fsspec IO thread; but then the async store on the main thread calls the same instance in async mode. Note that moto3 is also running on a thread, to complicate things. I am looking at it. |
That's great insight, thank you. Please let me know if there's anything about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see ✅!
Thanks @martindurant for working out the kinks here! (and @d-v-b and @dstansby for chipping in)
Highlighting this line, which works around tests passing offset-length for data which is b"". Python allows you to slice bytes like that (b""[1:2] == b""), but remote stores do not allow this. |
if byte_range | ||
else fs._cat_file(path) | ||
if byte_range: | ||
# fsspec uses start/end, not start/length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I count this as additional evidence that we should switch to start/end semantics for the rest of the stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kerchunk is the exception, storing start/length, mostly because length is generally smaller for chunks in big files.
except self.exceptions: | ||
return None | ||
except OSError as e: | ||
if "not satisfiable" in str(e): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagging this as s3 abstraction leakage that we might want to address later on by making an s3-specific storage class
# TODO: expectations for exceptions or missing keys? | ||
res = await self._fs._cat_ranges(list(paths), starts, stops, on_error="return") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think returning the exceptions is the right thing here
@@ -399,7 +399,9 @@ def test_hierarchy(self): | |||
assert [] == store.listdir(self.root + "c/x/y") | |||
assert [] == store.listdir(self.root + "c/d/y") | |||
assert [] == store.listdir(self.root + "c/d/y/z") | |||
assert [] == store.listdir(self.root + "c/e/f") | |||
# the following is listdir(filepath), for which fsspec gives [filepath] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the advantage of going with POSIX semantics here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fsspec tries to adhere to posix as much as possible. If we want to exclude the [file] case, we'd have to code that special case into our store.
|
||
|
||
@pytest.fixture(autouse=True, scope="function") | ||
def s3(s3_base): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant could you explain what's happening in this test fixture? e.g., why do we need to manipulate the cache, why do we need to create an instance of S3FileSystem
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest-asyncio creates a new event loop for each async test. When an async-mode s3fs instance is made from async, it will be assigned to the loop from which it is made. That means that if you use s3fs again from a subsequent test, you will have the same identical instance, but be running on a different loop - which fails.
For the rest: it's very convenient to clean up the state of the store between tests, make sure we start off blank each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thank you @martindurant et al
…r-developers#1679) This is adapted from the fixes that were rolled into zarr-developers#1785 for the v3 branch. Signed-off-by: Adam Williamson <[email protected]>
* v3: (22 commits) [v3] `Buffer` ensure correct subclass based on the `BufferPrototype` argument (zarr-developers#1974) Fix doc build (zarr-developers#1987) Fix doc build warnings (zarr-developers#1985) Automatically generate API reference docs (zarr-developers#1918) Update `RemoteStore.__str__` and add UPath tests (zarr-developers#1964) [v3] Elevate codec pipeline (zarr-developers#1932) 0 dim arrays: indexing (zarr-developers#1980) `parse_shapelike` allows 0 (zarr-developers#1979) Clean up typing and docs for indexing (zarr-developers#1961) add json indentation to config (zarr-developers#1952) chore: update pre-commit hooks (zarr-developers#1973) Bump pypa/gh-action-pypi-publish in the actions group (zarr-developers#1969) chore: update pre-commit hooks (zarr-developers#1957) Update release.rst (zarr-developers#1960) doc: update release notes for 3.0.0.alpha (zarr-developers#1959) Basic working FsspecStore (zarr-developers#1785) Feature: Top level V3 API (zarr-developers#1884) Buffer Prototype Argument (zarr-developers#1910) Create issue-metrics.yml fixes bug in transpose (zarr-developers#1949) ...
This works.
Store
's API what the expectation is.Fixes #1757
TODO: