Support compression for file reading and writing #80

nicholasjng · 2024-02-14T14:56:33Z

During ideation for reporting data in duckDB, I read one of their blog posts (https://duckdb.org/2023/03/03/json.html) that gives an example of loading a large (>10GB) compressed JSON archive into memory.

It would be really nice if we could support this with our file IO. In theory, the following steps would need to happen:

Expose a compression: str | None = None argument on (read|write)that gives the option of using a compression when writing a record (also prompting a lookup in a dictionary of compression algos, like https://github.com/fsspec/filesystem_spec/blob/master/fsspec/utils.py#L138).
This argument should also support the "infer" string as a special value (indicating that the compression should be inferred from the input filename).
Give the option of giving a directory name to (read|write)_batched that takes all records, reads/writes them to that directory in the given driver mode, and then compresses said directory.

The text was updated successfully, but these errors were encountered:

nicholasjng added the enhancement New feature or request label Feb 14, 2024

Maciej818 assigned nicholasjng Feb 21, 2024

nicholasjng linked a pull request Feb 21, 2024 that will close this issue

Add compression algorithms facility to file IO #84

Merged

nicholasjng closed this as completed in #84 Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support compression for file reading and writing #80

Support compression for file reading and writing #80

nicholasjng commented Feb 14, 2024

Support compression for file reading and writing #80

Support compression for file reading and writing #80

Comments

nicholasjng commented Feb 14, 2024