-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discuss] Don't panic if server fails to create scheduler #1443
Comments
@Connor1996 PTAL. |
seems that even though pd-ctl sends an invalid config, PD won't persist the config.
But for the disabled schedulers, seems they should be checked. /cc @nolouch |
In fact, I have a running cluster whose schedule config is a total mess. What I'm trying to point out is that we need a way to recover from/discard invalid config, instead of panic assuming the underlining etcd data always work perfectly. |
The panic is to notify users that there may be some typo of the config. If we just discard the invalid config, the PD's behavior may be not as same as what users expect. |
@lerencao We really need a recover way. @Connor1996 How about adding an option let PD in recovery mode and ignore some panics that similar this situation? |
Errorred spell from config file may be checked before ahead. And in real case, I believe most people just use default schedulers, and use pd-ctl to change schedulers dynamically. |
@lerencao check before ahead seems reasonable and neat |
Do you want to have a try? @lerencao |
Please answer these questions before submitting your issue. Thanks!
If possible, provide a recipe for reproducing the error.
From this PR, If server cannot create scheduler, it just panic. However, the reasons of failing to create scheduler is not just come from wrong configurations.
In real case, maybe schedulers itself raise some error, or maybe pd-ctl send an invalid config due to some kind of bugs. The online server cannot work normally again in these cases, because there is no way to delete the wrong config in the etcd kv before PD crashs.
Also, Should we check the disabled scheduler, if it's configured incorrectly, and delete it too?
What did you expect to see?
What did you see instead?
What version of PD are you using (
pd-server -V
)?The text was updated successfully, but these errors were encountered: