You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add file compression param while writing csv/parquet. For parquet, it compression defaults to snappy in parquet.py def write_table but in function to_csv, there is no param to specify file compression
ie. pandas.py
def to_csv(
self,
dataframe,
path,
database=None,
table=None,
partition_cols=None,
preserve_index=True,
mode="append",
procs_cpu_bound=None,
procs_io_bound=None,
)
Request is to add a new parameter for specifying the file compression eg - gzip
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
compression: str, default ‘infer’
Compression mode among the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If ‘infer’ and path_or_buf is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression).
The text was updated successfully, but these errors were encountered:
I've added compression options for Pandas.to_parquet and prepare the structures to do the same with Pandas.to_csv in the future.
But for now the second is blocked by a Pandas limitation.
How we aim to write compressed files directly to S3, we must wait until Pandas unlock the option of write compressed files in memory (instead of disk).
Add file compression param while writing csv/parquet. For parquet, it compression defaults to snappy in parquet.py def write_table but in function to_csv, there is no param to specify file compression
ie. pandas.py
def to_csv(
self,
dataframe,
path,
database=None,
table=None,
partition_cols=None,
preserve_index=True,
mode="append",
procs_cpu_bound=None,
procs_io_bound=None,
)
Request is to add a new parameter for specifying the file compression eg - gzip
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
compression: str, default ‘infer’
Compression mode among the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If ‘infer’ and path_or_buf is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression).
The text was updated successfully, but these errors were encountered: