Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema inference does not work in Ballista-cli with a remote context #287

Closed
Tracked by #273
avantgardnerio opened this issue Sep 27, 2022 · 3 comments · Fixed by #313
Closed
Tracked by #273

Schema inference does not work in Ballista-cli with a remote context #287

avantgardnerio opened this issue Sep 27, 2022 · 3 comments · Fixed by #313
Labels
bug Something isn't working

Comments

@avantgardnerio
Copy link
Contributor

Describe the bug

In datafusion-cli I can run:

/snap/bin/cargo run --color=always --bin datafusion-cli --manifest-path /home/bgardner/workspace/ballista/arrow-datafusion/datafusion-cli/Cargo.toml
    Finished dev [unoptimized + debuginfo] target(s) in 0.11s
     Running `target/debug/datafusion-cli`
DataFusion CLI v12.0.0
select 1;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.004 seconds.
create external table customer stored as CSV WITH HEADER ROW
    LOCATION '/home/bgardner/workspace/ballista/arrow-datafusion/datafusion/core/tests/tpch-csv/customer.csv';
0 rows in set. Query took 0.021 seconds.

select * from customer limit 1;
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+-------------------------------------------------------------------------------------------------------------------+
| c_custkey | c_name             | c_address                             | c_nationkey | c_phone         | c_acctbal | c_mktsegment | c_comment                                                                                                         |
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+-------------------------------------------------------------------------------------------------------------------+
| 2         | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak        | 13          | 23-768-687-3665 | 121.65    | AUTOMOBILE   | l accounts. blithely ironic theodolites integrate boldly: caref                                                   |
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+-------------------------------------------------------------------------------------------------------------------+
9 rows in set. Query took 0.011 seconds.

In ballista-cli I get:

/snap/bin/cargo run --color=always --bin ballista-cli --manifest-path /home/bgardner/workspace/ballista/arrow-ballista/ballista-cli/Cargo.toml -- --host 127.0.0.1 --port 50050
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running `target/debug/ballista-cli --host 127.0.0.1 --port 50050`
Ballista CLI v0.8.0
create external table customer stored as CSV with header row
    location '/home/bgardner/workspace/ballista/arrow-datafusion/datafusion/core/tests/tpch-csv/customer.csv';
0 rows in set. Query took 0.002 seconds.
select * from customer limit 5;
[2022-09-27T20:18:42Z ERROR ballista_core::execution_plans::distributed_query] Job rkjseLs failed: Task rkjseLs/1/0 failed: Task failed due to Tokio error: DataFusion error: Execution("ArrowError(InvalidArgumentError(\"must either specify a row count or at least one column\"))")
    
ArrowError(ExternalError(Execution("Job rkjseLs failed: Task rkjseLs/1/0 failed: Task failed due to Tokio error: DataFusion error: Execution(\"ArrowError(InvalidArgumentError(\\\"must either specify a row count or at least one column\\\"))\")\n")))
select c_name from customer limit 5;
SchemaError(FieldNotFound { qualifier: None, name: "c_name", valid_fields: Some([]) })

To Reproduce

Described above

Expected behavior

They work the same

@avantgardnerio avantgardnerio added the bug Something isn't working label Sep 27, 2022
@avantgardnerio
Copy link
Contributor Author

PS: ballista-cli works if it uses a "local" (datafusion) context:

/snap/bin/cargo run --color=always --bin ballista-cli --manifest-path /home/bgardner/workspace/ballista/arrow-ballista/ballista-cli/Cargo.toml
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running `target/debug/ballista-cli`
Ballista CLI v0.8.0
create external table customer stored as CSV WITH HEADER ROW
    LOCATION '/home/bgardner/workspace/ballista/arrow-datafusion/datafusion/core/tests/tpch-csv/customer.csv';
0 rows in set. Query took 0.026 seconds.

select * from customer;
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+-------------------------------------------------------------------------------------------------------------------+
| c_custkey | c_name             | c_address                             | c_nationkey | c_phone         | c_acctbal | c_mktsegment | c_comment                                                                                                         |
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+-------------------------------------------------------------------------------------------------------------------+
| 2         | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak        | 13          | 23-768-687-3665 | 121.65    | AUTOMOBILE   | l accounts. blithely ironic theodolites integrate boldly: caref                                                   |
| 3         | Customer#000000003 | MG9kdTD2WBHm                          | 1           | 11-719-748-3364 | 7498.12   | AUTOMOBILE   |  deposits eat slyly ironic, even instructions. express foxes detect slyly. blithely even accounts abov            |
| 4         | Customer#000000004 | XxVSJsLAGtn                           | 4           | 14-128-190-5944 | 2866.83   | MACHINERY    |  requests. final, regular ideas sleep final accou                                                                 |
| 5         | Customer#000000005 | KvpyuHCplrB84WgAiGV6sYpZq7Tj          | 3           | 13-750-942-6364 | 794.47    | HOUSEHOLD    | n accounts will have to unwind. foxes cajole accor                                                                |
| 6         | Customer#000000006 | sKZz0CsnMD7mp4Xd0YrBvx,LREYKUWAh yVn  | 20          | 30-114-968-4951 | 7638.57   | AUTOMOBILE   | tions. even deposits boost according to the slyly bold packages. final accounts cajole requests. furious          |
| 7         | Customer#000000007 | TcGe5gaZNgVePxU5kRrvXBfkasDTea        | 18          | 28-190-982-9759 | 9561.95   | AUTOMOBILE   | ainst the ironic, express theodolites. express, even pinto beans among the exp                                    |
| 8         | Customer#000000008 | I0B10bB0AymmC, 0PrRYBCP1yGJ8xcBPmWhl5 | 17          | 27-147-574-9335 | 6819.74   | BUILDING     | among the slyly regular theodolites kindle blithely courts. carefully even theodolites haggle slyly along the ide |
| 9         | Customer#000000009 | xKiAFTjUsCuxfeleNqefumTrjS            | 8           | 18-338-906-3675 | 8324.07   | FURNITURE    | r theodolites according to the requests wake thinly excuses: pending requests haggle furiousl                     |
| 10        | Customer#000000010 | 6LrEaV6KR6PLVcgl2ArL Q3rqzLzcT1 v2    | 5           | 15-741-346-9870 | 2753.54   | HOUSEHOLD    | es regular deposits haggle. fur                                                                                   |
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+-------------------------------------------------------------------------------------------------------------------+
9 rows in set. Query took 0.027 seconds.

@r4ntix
Copy link
Contributor

r4ntix commented Oct 2, 2022

It seems that the problem is with the schema in this code:
https://github.com/apache/arrow-ballista/blob/f5bfef00bcb695c68f377bdd23fa3efdefa1f43c/ballista/rust/client/src/context.rs#L383-L394

I will verify it again tomorrow and fix it.

@r4ntix
Copy link
Contributor

r4ntix commented Oct 3, 2022

I submit a PR for this: #313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants