-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler now verifies that file://
ListingTable URLs are accessible
#414
scheduler now verifies that file://
ListingTable URLs are accessible
#414
Conversation
@avantgardnerio PTAL |
ballista/scheduler/src/state/mod.rs
Outdated
@@ -256,11 +260,46 @@ impl<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> SchedulerState<T, | |||
plan: &LogicalPlan, | |||
) -> Result<()> { | |||
let start = Instant::now(); | |||
|
|||
// optimizing the plan here is redundant because the physical planner will do this again | |||
// but it is helpful to see what the optimized plan will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. How about changing this to the following:
if log::max_level() >= log::Level::Debug { let optimized_plan = session_ctx.optimize(plan)?; debug!("Optimized plan: {}", optimized_plan.display_indent()); }
so that we can avoid unnecessary optimization when the log level in set to be info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Maybe we just need to check the first path.
ballista/scheduler/src/state/mod.rs
Outdated
for url in table.table_paths() { | ||
// remove file:/// prefix and verify that the file is accessible | ||
let url = url.as_str(); | ||
let url = url.strip_prefix("file:///").unwrap_or(url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it check files on S3 or HDFS? Sometimes, the table_paths may be of tens of thousands. Here, the check logic may be very time consuming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a change so that this check is only performed for files on the local file system (starting with file:///
) and it now only checks the first file
Thanks for the review @yahoNanJing. I have pushed changes to address your feedback. |
file://
ListingTable URLs are accessible
@@ -229,6 +229,7 @@ pub fn remove_unresolved_shuffles( | |||
.iter() | |||
.map(|c| c | |||
.iter() | |||
.filter(|l| !l.path.is_empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change but this avoids printing out lots of newlines in the log
@@ -71,7 +71,7 @@ impl<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> SchedulerGrpc | |||
task_status, | |||
} = request.into_inner() | |||
{ | |||
debug!("Received poll_work request for {:?}", metadata); | |||
trace!("Received poll_work request for {:?}", metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was too verbose to be debug
I am going to go ahead and merge this so that I can cut the RC today. |
Which issue does this PR close?
Closes #353
Rationale for this change
My query now fails rather than silently producing the wrong results (empty result set).
What changes are included in this PR?
Are there any user-facing changes?