-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler silently replaces ParquetExec
with EmptyExec
if data path is not correctly mounted in container
#353
Comments
My local Docker images do not seem to have been rebuilt:
|
I deleted all images, rebuilt, and same issue. When running in docker-compose, scheduler shows:
When running scheduler outside of docker-compose, I see:
|
I added debug logging in Scheduler has this optimized logical plan:
Then the following code fails:
|
physical plan has this:
|
I now suspect this is somehow related to the eliminate_filter optimization rule inserting an EmptyRelation |
I maybe realize now what the root issue is - I was running the benchmark against a data set that was not mounted into the containers running under docker compose. I would expect this to cause the query to fail but somehow the optimizer is determining that no rows can match the filter and just removes the table scan! |
This is where the impl TableProvider for ListingTable {
async fn scan(
&self,
ctx: &SessionState,
projection: &Option<Vec<usize>>,
filters: &[Expr],
limit: Option<usize>,
) -> Result<Arc<dyn ExecutionPlan>> {
let (partitioned_file_lists, statistics) =
self.list_files_for_scan(ctx, filters, limit).await?;
// if no files need to be read, return an `EmptyExec`
if partitioned_file_lists.is_empty() {
let schema = self.schema();
let projected_schema = project_schema(&schema, projection.as_ref())?;
return Ok(Arc::new(EmptyExec::new(false, projected_schema)));
} |
ParquetExec
with EmptyExec
if data path is not correctly mounted in container
Is it a bug? Currently we do not have a Catalog service, if the data path does not exist, I think it is valid to return an empty relation. |
IMO, absolutely. I think few users would expect this behavior, and would spend quite a bit of time tracking it down. I can't think of another piece of software that treats a missing file the same way as an empty one. |
I believe Spark / Presto / etc. commonly return an empty result when given a path/table without any files (on object storage). This makes sense for an empty table. Looking at the example though it shows an actual file that has been listed, so in that case I agree we should return an error. |
Describe the bug
When I run Ballista in docker-compose and then run the benchmarks, all benchmark queries run very fast and return result sets with zero rows and zero columns and I see that the executed plans contain
EmptyExec
instead ofParquetExec
.To Reproduce
Then run benchmarks using instructions in repo.
Expected behavior
Should not be replacing
ParquetExec
withEmptyExec
Additional context
None
The text was updated successfully, but these errors were encountered: