CSV to Parquet

This repo is archived and the code moved to Arrow CLI Tools.

Convert CSV files to Apache Parquet. You may also be interested in json2parquet, csv2arrow, or json2arrow.

Installation

Download prebuilt binaries

You can get the latest releases from https://github.com/domoritz/csv2parquet/releases/.

With Cargo

cargo install csv2parquet

Usage

Usage: csv2parquet [OPTIONS] <CSV> <PARQUET>

Arguments:
  <CSV>      Input CSV file
  <PARQUET>  Output file

Options:
  -s, --schema-file <SCHEMA_FILE>
          File with Arrow schema in JSON format
      --max-read-records <MAX_READ_RECORDS>
          The number of records to infer the schema from. All rows if not present. Setting max-read-records to zero will stop schema inference and all columns will be string typed
      --header <HEADER>
          Set whether the CSV file has headers [possible values: true, false]
  -d, --delimiter <DELIMITER>
          Set the CSV file's column delimiter as a byte character [default: ,]
  -c, --compression <COMPRESSION>
          Set the compression [possible values: uncompressed, snappy, gzip, lzo, brotli, lz4, zstd]
  -e, --encoding <ENCODING>
          Sets encoding for any column [possible values: plain, rle, bit-packed, delta-binary-packed, delta-length-byte-array, delta-byte-array, rle-dictionary]
      --data-pagesize-limit <DATA_PAGESIZE_LIMIT>
          Sets data page size limit
      --dictionary-pagesize-limit <DICTIONARY_PAGESIZE_LIMIT>
          Sets dictionary page size limit
      --write-batch-size <WRITE_BATCH_SIZE>
          Sets write batch size
      --max-row-group-size <MAX_ROW_GROUP_SIZE>
          Sets max size for a row group
      --created-by <CREATED_BY>
          Sets "created by" property
      --dictionary
          Sets flag to enable/disable dictionary encoding for any column
      --statistics <STATISTICS>
          Sets flag to enable/disable statistics for any column [possible values: none, chunk, page]
      --max-statistics-size <MAX_STATISTICS_SIZE>
          Sets max statistics size for any column. Applicable only if statistics are enabled
  -p, --print-schema
          Print the schema to stderr
  -n, --dry
          Only print the schema
  -h, --help
          Print help information
  -V, --version
          Print version information

The --schema-file option uses the same file format as --dry and --print-schema.

Examples

Convert a CSV to Parquet

csv2parquet data.csv data.parquet

Convert a CSV with no `header` to Parquet

csv2parquet --header false <CSV> <PARQUET>

Get the `schema` from a CSV with header

csv2parquet --header true --dry <CSV> <PARQUET>

Convert a CSV using `schema-file` to Parquet

Below is an example of the schema-file content:

{
  "fields": [
    {
      "name": "col1",
      "data_type": "Utf8",
      "nullable": false,
      "dict_id": 0,
      "dict_is_ordered": false,
      "metadata": {}
    },
    {
      "name": " col2",
      "data_type": "Utf8",
      "nullable": false,
      "dict_id": 0,
      "dict_is_ordered": false,
      "metadata": {}
    }
  ],
  " metadata": {}
}

Then add the schema-file schema.json in the command:

csv2parquet --header false --schema-file schema.json <CSV> <PARQUET>

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.github		.github
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE_APACHE.txt		LICENSE_APACHE.txt
LICENSE_MIT.txt		LICENSE_MIT.txt
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

CSV to Parquet

Installation

Download prebuilt binaries

With Cargo

Usage

Examples

Convert a CSV to Parquet

Convert a CSV with no `header` to Parquet

Get the `schema` from a CSV with header

Convert a CSV using `schema-file` to Parquet

About

Licenses found

Releases 15

Packages

Contributors 10

Languages

License

Licenses found

domoritz/csv2parquet

Folders and files

Latest commit

History

Repository files navigation

CSV to Parquet

Installation

Download prebuilt binaries

With Cargo

Usage

Examples

Convert a CSV to Parquet

Convert a CSV with no header to Parquet

Get the schema from a CSV with header

Convert a CSV using schema-file to Parquet

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 15

Packages 0

Contributors 10

Languages

Convert a CSV with no `header` to Parquet

Get the `schema` from a CSV with header

Convert a CSV using `schema-file` to Parquet

Packages