Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compression for file reading and writing #80

Closed
3 tasks
nicholasjng opened this issue Feb 14, 2024 · 0 comments · Fixed by #84
Closed
3 tasks

Support compression for file reading and writing #80

nicholasjng opened this issue Feb 14, 2024 · 0 comments · Fixed by #84
Assignees
Labels
enhancement New feature or request

Comments

@nicholasjng
Copy link
Collaborator

During ideation for reporting data in duckDB, I read one of their blog posts (https://duckdb.org/2023/03/03/json.html) that gives an example of loading a large (>10GB) compressed JSON archive into memory.

It would be really nice if we could support this with our file IO. In theory, the following steps would need to happen:

  • Expose a compression: str | None = None argument on (read|write)that gives the option of using a compression when writing a record (also prompting a lookup in a dictionary of compression algos, like https://github.com/fsspec/filesystem_spec/blob/master/fsspec/utils.py#L138).
  • This argument should also support the "infer" string as a special value (indicating that the compression should be inferred from the input filename).
  • Give the option of giving a directory name to (read|write)_batched that takes all records, reads/writes them to that directory in the given driver mode, and then compresses said directory.
@nicholasjng nicholasjng added the enhancement New feature or request label Feb 14, 2024
@nicholasjng nicholasjng linked a pull request Feb 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant