Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response payload is not completed when reading a file #786

Closed
fjetter opened this issue Sep 11, 2023 · 5 comments · Fixed by #787
Closed

Response payload is not completed when reading a file #786

fjetter opened this issue Sep 11, 2023 · 5 comments · Fixed by #787

Comments

@fjetter
Copy link
Contributor

fjetter commented Sep 11, 2023

This traceback is from a dask workload that is reading hd5 files. Unfortunately, I don't have a reliable reproducer and don't know how to trigger this.

However, based on this traceback and the exception message I assume that some data transfer is prematurely aborted, possibly because of an S3 blip, and I would expect something in this stack to retry such an exception OR raise an actionable error.

File "/opt/coiled/env/lib/python3.10/site-packages/fsspec/spec.py", line 1800, in readinto
    data = self.read(out.nbytes)
  File "/opt/coiled/env/lib/python3.10/site-packages/fsspec/spec.py", line 1790, in read
    out = self.cache._fetch(self.loc, self.loc + length)
  File "/opt/coiled/env/lib/python3.10/site-packages/fsspec/caching.py", line 156, in _fetch
    self.cache = self.fetcher(start, end)  # new block replaces old
  File "/opt/coiled/env/lib/python3.10/site-packages/s3fs/core.py", line 2185, in _fetch_range
    return _fetch_range(
  File "/opt/coiled/env/lib/python3.10/site-packages/s3fs/core.py", line 2348, in _fetch_range
    return sync(fs.loop, _inner_fetch, fs, bucket, key, version_id, start, end, req_kw)
  File "/opt/coiled/env/lib/python3.10/site-packages/fsspec/asyn.py", line 106, in sync
    raise return_result
  File "/opt/coiled/env/lib/python3.10/site-packages/fsspec/asyn.py", line 61, in _runner
    result[0] = await coro
  File "/opt/coiled/env/lib/python3.10/site-packages/s3fs/core.py", line 2360, in _inner_fetch
    return await resp["Body"].read()
  File "/opt/coiled/env/lib/python3.10/site-packages/aiobotocore/response.py", line 57, in read
    chunk = await self.__wrapped__.content.read(
  File "/opt/coiled/env/lib/python3.10/site-packages/aiohttp/streams.py", line 375, in read
    block = await self.readany()
  File "/opt/coiled/env/lib/python3.10/site-packages/aiohttp/streams.py", line 397, in readany
    await self._wait("readany")
  File "/opt/coiled/env/lib/python3.10/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter

ClientPayloadError('Response payload is not completed')

Does anybody know what is causing this?

aiobotocore==2.5.4
aiohttp==3.8.5
s3fs==2023.6.0
fsspec==2023.6.0
@martindurant
Copy link
Member

At first glance it looks like a pretty low level network error. Are you there is no intermittent condition on your connection?

@fjetter
Copy link
Contributor Author

fjetter commented Sep 11, 2023

This is happening on a coiled cluster. While I can't guarantee it, I believe our network there is pretty stable. Retrying the task sufficiently often works but I would expect such an error to be retried further down the stack, definitely not on dask level

@martindurant
Copy link
Member

ClientPayloadError is in the list of retriable errors in s3fs. If you activate logger "s3fs", you will see whether it is indeed being retried or not.

@fjetter
Copy link
Contributor Author

fjetter commented Sep 11, 2023

Well, the traceback is blaming L2360 (2023.6.0) in _inner_fetch

async def _inner_fetch(fs, bucket, key, version_id, start, end, req_kw=None):
    resp = await fs._call_s3(
        "get_object",
        Bucket=bucket,
        Key=key,
        Range="bytes=%i-%i" % (start, end - 1),
        **version_id_kw(version_id),
        **req_kw,
    )
    return await resp["Body"].read()

which is the resp["Body"].read() call. However, this read is a simple aiohttp request that is not covered by the retry mechanism of s3fs (implemented in _error_wrapper)

I haven't checked the logs, yet, but the code does not look as if it would retry this exception at this position

@fjetter
Copy link
Contributor Author

fjetter commented Sep 11, 2023

A similar issue was apparently fixed for other APIs already, see #601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants