Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlos_bench: error handling improvements #523

Open
bpkroth opened this issue Oct 3, 2023 · 2 comments
Open

mlos_bench: error handling improvements #523

bpkroth opened this issue Oct 3, 2023 · 2 comments

Comments

@bpkroth
Copy link
Contributor

bpkroth commented Oct 3, 2023

Sometimes user scripts don't return a score value, even though they exit 0 (indicating SUCCESS).

In that case we can do a couple of things:

  • abort immediately in order to notify the experimenter and let them figure out what to do
  • assume it's a bad config and that's why the benchmark aborted early
    • in which case we should fabricate a "fake" score that looks "bad" (i.e., much worse than any we've actually recorded with a good config) so that the optimizer learns that this is an infeasible region (there are already TODO markers in the code to implement this)
  • some cominbation of the two
    for instance, tolerate no more than N "bad" configs in a row before we assume its a script error and abort entirely to notify the user that they should manually inspect and deal with things
@bpkroth bpkroth assigned bpkroth and unassigned bpkroth Oct 3, 2023
@bpkroth
Copy link
Contributor Author

bpkroth commented Oct 3, 2023

@eujing

@bpkroth
Copy link
Contributor Author

bpkroth commented Nov 30, 2023

One thing that might make this easier to implement is if we clearly separated the phases of "setup" (e.g., basic system preparation) vs. "configure" (e.g., configure the target system with the tunables).

That way, if "setup" failed, we could alert that the script was the problem, whereas if "configure" failed, we could inform the optimizer that it was a bad region.

See Also: #498

@bpkroth bpkroth mentioned this issue May 10, 2024
7 tasks
@bpkroth bpkroth mentioned this issue Jul 24, 2024
bpkroth added a commit that referenced this issue Aug 20, 2024
The `mlos_bench` CLI wrapper exits non-zero currently even on success.

This PR adds some basic sanity checks and makes sure we exit 0 when the
process looks roughly OK.

Further work to be expanded in #523.

---------

Co-authored-by: Sergiy Matusevych <[email protected]>
Co-authored-by: Sergiy Matusevych <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant