Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute LogicalPlans after building for TPCH Benchmarks #3273

Closed
DaltonModlin opened this issue Aug 26, 2022 · 4 comments · Fixed by #3290
Closed

Execute LogicalPlans after building for TPCH Benchmarks #3273

DaltonModlin opened this issue Aug 26, 2022 · 4 comments · Fixed by #3290
Labels
bug Something isn't working

Comments

@DaltonModlin
Copy link
Contributor

DaltonModlin commented Aug 26, 2022

Describe the bug
Currently, running Q15 fails when attempting to build the second (SELECT) portion of the overall query.

The error is similar to the following:

called `Result::unwrap()` on an `Err` value: Plan("'datafusion.public.revenue0' not found")
thread 'tests::run_q15' panicked at 'called `Result::unwrap()` on an `Err` value: Plan("'datafusion.public.revenue0' not found")', /home/dalto/projects/arrow-datafusion/datafusion/sql/src/planner.rs:161:31

To Reproduce
Run test run_q15 in benchmarks/src/bin/tpch.rs

Expected behavior
I expect the test to fail due to either #3266 or #3267

Additional context
The table names just before failure are:

schema table names: [
    "customer",
    "supplier",
    "region",
    "part",
    "orders",
    "nation",
    "lineitem",
    "partsupp",
]

I'm unsure of where views populate this table, if anyone else has knowledge here that would be great.

@DaltonModlin DaltonModlin added the bug Something isn't working label Aug 26, 2022
@avantgardnerio
Copy link
Contributor

@kmitchener @andygrove @alamb

@kmitchener
Copy link
Contributor

This is happening because of the way tpch is running the test queries. q15 is made up of 3 SQL statements. run_query() attempts to create a plan for each of them before executing them. The second statement in q15 references the revenue0 view created in the first statement (which isn't created yet), so the planning of the second statement fails.

Probably run_query() should be modified to execute the statements in turn, rather than attempting to plan all of them before beginning execution.

@DaltonModlin
Copy link
Contributor Author

DaltonModlin commented Aug 26, 2022

That's exactly what I was thinking looking at it last night, and I was just writing a test to confirm by splitting the query. @kmitchener good to know I'm moving in the right direction. Assuming I get a different error by splitting the query, I'll try and rewrite the testing process in tpch.rs and see if it fixes this problem.

@avantgardnerio
Copy link
Contributor

Probably run_query() should be modified to execute the statements in turn, rather than attempting to plan all of them before beginning execution.

This ☝️ . AFAIK, the TPC-H queries are what they are, and we have to run them as such. This one is ensuring we support multiple queries in one command, so to be TPC-H compliant I think it has to "just work".

@DaltonModlin DaltonModlin changed the title View doesn't populate session context table_names Execute LogicalPlans after building for TPCH Benchmarks Aug 29, 2022
DaltonModlin added a commit to spaceandtimelabs/arrow-datafusion that referenced this issue Aug 29, 2022
- tpch.rs::benchmark_datafusion now executes LogicalPlan immediately after building
- tpch.rs::run_query now executes LogicalPlan immediately after building

Resolves apache#3273
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants