Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add print format param and support for csv print format to datafusion cli #289

Merged
merged 4 commits into from
May 9, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented May 8, 2021

Which issue does this PR close?

Closes #290.

This shall be merged after #285

Rationale for this change

Printing results in CSV mode allows for better parsing, e.g. in automatic systems.

What changes are included in this PR?

  • adding arrow deps
    - adding tmpfile deps
  • taking additional --format flag

Are there any user-facing changes?

  • additional feature, no breaking changes.

@jimexist
Copy link
Member Author

jimexist commented May 8, 2021

❯ cargo run --release --bin datafusion-cli -q -- --csv
> select 1 num, sin(1) sin, cos(1) cos, 1+2 sum;
num,sin,cos,sum
1,0.8414709848078965,0.5403023058681398,3

1 rows in set. Query took 0 seconds.

@jimexist
Copy link
Member Author

jimexist commented May 8, 2021

❯ cargo run --release --bin datafusion-cli -q -- --help
DataFusion 4.0.0-SNAPSHOT
DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries
against CSV and Parquet files as well as querying directly against in-memory data.

USAGE:
    datafusion-cli [FLAGS] [OPTIONS]

FLAGS:
        --csv        Switches to CSV (Comma-Separated Values) output mode.
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -c, --batch-size <batch-size>    The batch size of each query, or use DataFusion default
    -p, --data-path <data-path>      Path to your data, default to current directory
    -f, --file <file>                Execute commands from file, then exit

@jimexist
Copy link
Member Author

jimexist commented May 8, 2021

this shall pave the way for finally merging #281

@codecov-commenter
Copy link

codecov-commenter commented May 8, 2021

Codecov Report

Merging #289 (f113486) into master (d0a4552) will decrease coverage by 0.08%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #289      +/-   ##
==========================================
- Coverage   76.07%   75.99%   -0.09%     
==========================================
  Files         140      141       +1     
  Lines       23632    23657      +25     
==========================================
  Hits        17978    17978              
- Misses       5654     5679      +25     
Impacted Files Coverage Δ
datafusion-cli/src/format/print_format.rs 0.00% <0.00%> (ø)
datafusion-cli/src/main.rs 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0a4552...f113486. Read the comment docs.

@jimexist jimexist force-pushed the add-csv-mode-to-datafusion-cli branch from 66bcfc4 to 3f3de23 Compare May 8, 2021 07:24
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature train 🚋 keeps on rolling! This is great @jimexist . Thank you

datafusion-cli/src/format/print_format.rs Outdated Show resolved Hide resolved
let mut writer = builder.build(&file);
batches
.iter()
.for_each(|batch| writer.write(batch).unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to return the error here rather than calling unwrap()

file.read_to_string(&mut data)?;
println!("{}", data);
}
PrintFormat::Aligned => pretty::print_batches(batches)?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I called this format Pretty when I was doing something similar in IOx -- I don't think it is really important, however

@jimexist jimexist force-pushed the add-csv-mode-to-datafusion-cli branch from 3f3de23 to c436afd Compare May 8, 2021 12:16
@jimexist
Copy link
Member Author

jimexist commented May 8, 2021

❯ echo 'select 1 num, 1 + 2 arith, cos(0.5) cos;' | cargo run --release --bin datafusion-cli -q
+-----+-------+--------------------+
| num | arith | cos                |
+-----+-------+--------------------+
| 1   | 3     | 0.8775825618903728 |
+-----+-------+--------------------+
1 rows in set. Query took 0 seconds.

❯ echo 'select 1 num, 1 + 2 arith, cos(0.5) cos;' | cargo run --release --bin datafusion-cli -q -- --format table
+-----+-------+--------------------+
| num | arith | cos                |
+-----+-------+--------------------+
| 1   | 3     | 0.8775825618903728 |
+-----+-------+--------------------+
1 rows in set. Query took 0 seconds.
❯ echo 'select 1 num, 1 + 2 arith, cos(0.5) cos;' | cargo run --release --bin datafusion-cli -q -- --format csv
num,arith,cos
1,3,0.8775825618903728

1 rows in set. Query took 0 seconds.
cargo run --release --bin datafusion-cli -q -- --help
DataFusion 4.0.0-SNAPSHOT
DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries
against CSV and Parquet files as well as querying directly against in-memory data.

USAGE:
    datafusion-cli [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -c, --batch-size <batch-size>    The batch size of each query, or use DataFusion default
    -p, --data-path <data-path>      Path to your data, default to current directory
    -f, --file <file>                Execute commands from file, then exit
        --format <format>            Output format (possible values: table, csv) [default: table]
❯ echo 'select 1 num, 1 + 2 arith, cos(0.5) cos;' | cargo run --release --bin datafusion-cli -q -- --format tsv
error: Invalid value for '--format <format>': Format 'tsv' not supported

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @jimexist ! 💯💯💯

@jimexist jimexist changed the title Add csv mode to datafusion cli Add print format param and support for csv print format to datafusion cli May 8, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 Looks great. Thanks @jimexist

@alamb alamb merged commit 204d4f5 into apache:master May 9, 2021
@jimexist jimexist deleted the add-csv-mode-to-datafusion-cli branch May 9, 2021 14:48
@houqp houqp added datafusion Changes in the datafusion crate enhancement New feature or request labels Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for csv mode similar to psql in datafusion-cli
5 participants