Impl s3fs cursor #272

laughingman7743 · 2022-01-23T07:01:57Z

A cursor implementation to read CSV files in S3 without using Pandas.
It would be good to be able to use awsathena+s3fs in SQLAlchemy.

https://github.com/fsspec/s3fs
https://docs.python.org/3/library/csv.html

The text was updated successfully, but these errors were encountered:

laughingman7743 · 2022-07-31T17:11:23Z

AbstractFileSystem
https://github.com/fsspec/filesystem_spec/blob/2022.7.1/fsspec/spec.py#L92
AbstractBufferedFile
https://github.com/fsspec/filesystem_spec/blob/2022.7.1/fsspec/spec.py#L1299

S3FileSystem
https://github.com/fsspec/s3fs/blob/2022.7.1/s3fs/core.py#L168
S3File
https://github.com/fsspec/s3fs/blob/2022.7.1/s3fs/core.py#L1822

It appears that awswrangler takes the approach of splitting the files into smaller chunk sizes and using ThreadPoolExecutor to retrieve them in parallel.
https://github.com/awslabs/aws-data-wrangler/blob/2.16.1/awswrangler/s3/_fs.py#L262-L300

Since s3fs depends on aiobotocore, and aiobotocore's botocore library has strict version dependencies, it seems like a good idea to create my own S3 file system using ThreadPoolExecutor, a similar approach to awswrangler, instead of asyncio.

laughingman7743 mentioned this issue Mar 29, 2022

Documenting/improving memory behavior #257

Open

laughingman7743 mentioned this issue Aug 7, 2022

Impl s3filesystem #360

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impl s3fs cursor #272

Impl s3fs cursor #272

laughingman7743 commented Jan 23, 2022 •

edited

Loading

laughingman7743 commented Jul 31, 2022 •

edited

Loading

Impl s3fs cursor #272

Impl s3fs cursor #272

Comments

laughingman7743 commented Jan 23, 2022 • edited Loading

laughingman7743 commented Jul 31, 2022 • edited Loading

laughingman7743 commented Jan 23, 2022 •

edited

Loading

laughingman7743 commented Jul 31, 2022 •

edited

Loading