Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ballista support to DataFusion CLI #889

Merged
merged 15 commits into from
Aug 16, 2021
Merged

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Aug 15, 2021

Which issue does this PR close?

Closes #886 .

$ ./target/release/datafusion-cli --host localhost --port 50050
DataFusion CLI v4.0.0-SNAPSHOT

> create external table lineitem stored as parquet location '/mnt/tpch/parquet-sf100-partitioned/lineitem';
0 rows in set. Query took 0.015 seconds.

> select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
        l_shipdate <= date '1998-09-02'
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;
+--------------+--------------+------------+--------------------+--------------------+-------------------+--------------------+--------------------+----------------------+-------------+
| l_returnflag | l_linestatus | sum_qty    | sum_base_price     | sum_disc_price     | sum_charge        | avg_qty            | avg_price          | avg_disc             | count_order |
+--------------+--------------+------------+--------------------+--------------------+-------------------+--------------------+--------------------+----------------------+-------------+
| A            | F            | 3775127758 | 5660776097194.456  | 5377736398183.934  | 5592847429515.928 | 25.499370423275426 | 38236.116984304936 | 0.05000224353092895  | 148047881   |
| N            | F            | 98553062   | 147771098385.9801  | 140384965965.03497 | 145999793032.7757 | 25.501556956882876 | 38237.19938880453  | 0.049985284338054006 | 3864590     |
| N            | O            | 7436302959 | 11150725648169.863 | 10593195276359.283 | 11016932215670.58 | 25.500009433521782 | 38237.227663621445 | 0.0499979183499098   | 291619616   |
| R            | F            | 3775724970 | 5661603032745.348  | 5378513563915.405  | 5593662252666.917 | 25.50006628406532  | 38236.697258453016 | 0.050001304339654135 | 148067261   |
+--------------+--------------+------------+--------------------+--------------------+-------------------+--------------------+--------------------+----------------------+-------------+
4 rows in set. Query took 8.677 seconds.

Rationale for this change

Enables a SQL repl for Ballista with minimal effort.

What changes are included in this PR?

datafusion-cli now has new host and port arguments

Are there any user-facing changes?

No

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Aug 15, 2021
@andygrove andygrove changed the title DRAFT: Add Ballista support to DataFusion CLI Add Ballista support to DataFusion CLI Aug 15, 2021
@andygrove andygrove marked this pull request as ready for review August 15, 2021 19:33
@houqp
Copy link
Member

houqp commented Aug 15, 2021

looks like code blocks in readme needs to be marked with no_run

@houqp
Copy link
Member

houqp commented Aug 16, 2021

my bad, no_run still compiles the code, we would need to use ignore here :(

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code and it looks good to me. I think we have reasonable CI test coverage for the datafusion-cli (as it is used in the python integration test suite that @jimexist started).

Thanks @andygrove

.map_err(|e| DataFusionError::Execution(format!("{:?}", e)))?;
Ok(Context::Remote(BallistaContext::remote(h, p, &config)))
}
_ => Ok(Context::Local(ExecutionContext::with_config(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 perfect

@andygrove andygrove merged commit d4e32d1 into apache:master Aug 16, 2021
@andygrove andygrove deleted the ballista-cli branch August 16, 2021 13:24
@jimexist
Copy link
Member

Sorry that I've been quite busy these days. Will review by end of the week and also see what else i can contribute to. This is exciting!

@houqp houqp added the enhancement New feature or request label Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update datafusion-cli to support Ballista, or implement new ballista-cli
4 participants