Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for querying S3 data with CLI #3631

Merged
merged 6 commits into from
Sep 28, 2022

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

Closes #3399

Rationale for this change

We should tell users how to use these new features

What changes are included in this PR?

Docs

Are there any user-facing changes?

No

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this!

@andygrove andygrove marked this pull request as draft September 27, 2022 23:28
/// - AWS_SECRET_ACCESS_KEY
///
#[tokio::main]
async fn main() -> Result<()> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example currently fails with:

Error: ObjectStore(Generic { store: "S3", source: MissingLastModified })

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #3633 for this

@codecov-commenter
Copy link

codecov-commenter commented Sep 28, 2022

Codecov Report

Merging #3631 (30bc3f8) into master (451e441) will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3631   +/-   ##
=======================================
  Coverage   86.01%   86.02%           
=======================================
  Files         300      300           
  Lines       56416    56469   +53     
=======================================
+ Hits        48525    48576   +51     
- Misses       7891     7893    +2     
Impacted Files Coverage Δ
datafusion/core/src/physical_plan/metrics/value.rs 87.06% <0.00%> (-0.50%) ⬇️
...sion/core/src/physical_plan/file_format/parquet.rs 94.33% <0.00%> (-0.35%) ⬇️
datafusion/common/src/scalar.rs 85.31% <0.00%> (+0.06%) ⬆️
datafusion/expr/src/logical_plan/builder.rs 90.61% <0.00%> (+0.20%) ⬆️
datafusion/expr/src/logical_plan/plan.rs 77.42% <0.00%> (+0.32%) ⬆️
...afusion/core/src/datasource/file_format/parquet.rs 86.30% <0.00%> (+0.73%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more


The CLI can query data in S3 if the following environment variables are defined:

- `AWS_DEFAULT_REGION`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS_REGION is more standard, AWS_DEFAULT_REGION is the fallback

@andygrove andygrove marked this pull request as ready for review September 28, 2022 13:24
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

$ export AWS_SECRET_ACCESS_KEY=***************************
$ export AWS_ACCESS_KEY_ID=**************

$ ./target/release/datafusion-cli
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so cool

@alamb alamb merged commit 06a4f79 into apache:master Sep 28, 2022
@ursabot
Copy link

ursabot commented Sep 28, 2022

Benchmark runs are scheduled for baseline = b4c0601 and contender = 06a4f79. 06a4f79 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@andygrove andygrove deleted the cli-s3-docs branch January 27, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add documentation on querying against files in object store such as S3
6 participants