Make the splitter use less memory #85

rom1504 · 2022-01-03T23:38:54Z

If doing that then the memory usage can be capped to very low values

rom1504 · 2022-01-07T00:14:33Z

rom1504 · 2022-02-04T09:53:13Z

read the files in batches, no need to prepare the whole first file of feather batches

rom1504 · 2022-02-05T00:06:16Z

at least write all the shards in parallel so it's faster to write to high latency fs (s3, hdfs), eg by using https://filesystem-spec.readthedocs.io/en/latest/async.html

rom1504 · 2022-05-18T22:05:23Z

done now

rom1504 mentioned this issue Feb 5, 2022

finish aws s3 support #120

Closed

rom1504 mentioned this issue Feb 12, 2022

fix s3fs dying after 6h with Unable to locate credentials #137

Closed

rom1504 closed this as completed May 18, 2022

Provide feedback