-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SchedulerConfig for the scheduler configurations, like event_loop_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds #472
Conversation
Hi @andygrove, @Dandandan, @avantgardnerio, could you help review this PR which refactors the scheduler configurations so that we can add more scheduler related configurations easily in the future? |
…_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds
d35d3c2
to
0ebca57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I reviewed the config changes, they seem to solve an immediate problem without making the existing pattern any worse
- I didn't review the functional change - cleaning up job stuff, I assume that works
- Let's discuss the runtime
HashMap
based config somewhere and if we all agree on a resolution, update this issue
ballista/core/src/config/query.rs
Outdated
|
||
impl BallistaConfig { | ||
/// Create a configuration builder | ||
pub fn builder() -> BallistaConfigBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a philosophical nit: I don't love the builder patterns that are propagating through the codebase. AFAICT, builders came from Java and were built on nullability and mutable state, due to the fact that it didn't have a using statement or a spread operator.
In Rust we have the equivalent of spread, so I think we could just do:
let my_config = {
param_a,
param_b,
.. ValidConfig
};
To accomplish the same thing in a much more concise manner.
ballista/core/src/config.rs
Outdated
default_value, | ||
} | ||
} | ||
} | ||
|
||
/// Ballista configuration builder | ||
pub struct BallistaConfigBuilder { | ||
settings: HashMap<String, String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it looks like this was already a HashMap based thing. I guess this PR is refactoring that pattern?
ballista/core/src/config.rs
Outdated
|
||
/// All available configuration options | ||
pub fn valid_entries() -> HashMap<String, ConfigEntry> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it would make more sense to just serde this once from untyped things like env vars into a typed struct and use that everywhere. A quick googling reveals: https://github.com/softprops/envy
@@ -74,7 +79,7 @@ async fn start_server( | |||
addr: SocketAddr, | |||
scheduling_policy: TaskSchedulingPolicy, | |||
slots_policy: SlotsPolicy, | |||
event_loop_buffer_size: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the actual goal of the PR?
- To aid in PR review, I think it can really help if the author comments on a few of their own lines of code to highlight things like this
- @daltonmadolin I think you're currently working on the same thing, PTAL
- If merging this PR makes things no worse, and fixes the 8 arguments clippy issue, I'm fine with it.
Thanks @avantgardnerio for your comment. I'll learn about https://github.com/softprops/envy later. For the concern of using |
ballista/core/src/config/query.rs
Outdated
|
||
/// Ballista configuration, mainly for the query | ||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub struct BallistaConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intent of moving the configs into a query
namespace? Not all configs will be related to the execution of a single query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing configurations seem to be all related to the query execution, which may be used by both the scheduler and executor. The reason to add the query namespace is to distinguish it from the new added SchedulerConfig, since the name BallistaConfig
may be too general and may give users wrong impression that it includes the SchedulerConfig.
Hi @andygrove, do you have any more concerns? |
@yahoNanJing I don't see any updates in the user guide page that covers configuration. I think it would make the review easier if I can see the new docs explaining how to use these configs. |
…sult_route_endpoint and add it to the SchedulerConfig
Hi @avantgardnerio and @DaltonModlin, with the current configuration refactoring, just add the advertise_endpoint to the scheduler config. Could you help review the related changes? |
d951e05
to
26c8611
Compare
26c8611
to
8335ea9
Compare
Hi @andygrove, the scheduler config just be added to the user guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this turn the command line argument into --advertise-flight-result-route-endpoint
? That seems overly verbose for a command line argument name. --help
will show the full description. I think we could safely get away with --advertise-flight-endpoint
at most.
docs/source/user-guide/configs.md
Outdated
|
||
| key | type | default | description | | ||
|------------------------------------------------|-----------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| scheduler_policy | Utf8 | pull-staged | Sets the scheduing policy for the scheduler, possible values: pull-staged, push-staged. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table is using underscore (scheduler_policy
) but the example shell command is using hyphen (scheduler-policy
).
ballista/scheduler/src/main.rs
Outdated
let mut config_builder = SchedulerConfigBuilder::default() | ||
.set( | ||
BALLISTA_SCHEDULER_EVENT_LOOP_BUFFER_SIZE, | ||
&opt.event_loop_buffer_size.to_string(), | ||
) | ||
.set( | ||
BALLISTA_FINISHED_JOB_DATA_CLEANUP_DELAY_SECS, | ||
&opt.finished_job_data_clean_up_interval_seconds.to_string(), | ||
) | ||
.set( | ||
BALLISTA_FINISHED_JOB_STATE_CLEANUP_DELAY_SECS, | ||
&opt.finished_job_state_clean_up_interval_seconds.to_string(), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the intention here of creating key-value pairs for these configs. It looks like these configs are just command-line arguments to the scheduler and there is no need to serialize them or send them over the network?
Shouldn't the scheduler config just be a simple struct like this?
struct SchedulerConfig {
event_loop_buffer_size: ...,
finished_job_data_clean_up_interval_seconds: ...,
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove for your comments. Agree with you. It's a bit over designed. I'll change it back and use a normal struct to indicate the scheduler config.
Hi @avantgardnerio, is it only for the result or both the result and the intermediate shuffle data? If it's only for the result, it's better to include 'result' in the configuration name. How about |
ce09a10
to
6b2cb02
Compare
0ac60b5
to
b4f371c
Compare
How about A tangent not for this PR: we probably just want separate |
Seems good to me. |
24f7f71
to
f4c4d64
Compare
f4c4d64
to
fa52343
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yahoNanJing! That is looking much cleaner now and easier to review.
Which issue does this PR close?
Closes #469.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?