-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache failed builds #255
Cache failed builds #255
Conversation
0cd230b
to
61b8bf3
Compare
Update, it works 🎉 it needs a lot of cleanup, typing and general refactoring. And I probably broke combined builds, but hey, it works. |
Todays progress is good, it type checks now, omits errored evals and skips the already cached ones. Further refactoring will allow to show failed build steps for failed evals. And then a failed build cache will achieve failed build caching. |
@MagicRB Do we still need the combined build if the success status for the |
Thats a separate PR after this one imo, but yea that could be done |
I found a bug in the original lix code, seems that job failures weren't properly propagated to dependents,
need to be initialized to the job that failed or the function will never "recurse" (it's an iterative graph walk which is expressed way more naturally with recursion, but python is pythonic). It appears that I missed this commit when doing the initial cherry pick |
Nice. |
6201c0f
to
a76ec3c
Compare
|
||
|
||
DB_NOT_INIT_MSG = "Database not initialized" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid the error above above by adding a database class:
class Database:
def __init__(db_path: Path): -> None:
self.handle = dbm.open(str(db_path), "c")
def add_build(self, derivation: str, time: datetime) -> None:
self.handle[derivation] = FailedBuild(derivation=derivation, time=time).model_dump_json()
# ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall like it would be less code because we don't need to check if there is a database initialized. We just need to pass down the database into the the builds steps where we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid threading it through for now as we plan to put the data into postgres anyway. And then we'd be accessing it through buildbots machinery hopefully. Treating this as a temporary solution so tried to make it as self contained as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see what you mean by threading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant adding it everywhere only to then remove it later when once we integrate with buildbots database we'll access it through self.master.db
so we'd be needlessly adding arguments and then removing them. That was my initial thinking, but tbh we've spent more time talking about it than I would have spent doing it 😂
a76ec3c
to
d2896a6
Compare
I think this is ready for review, it works I've tested it thoroughly and it seems to behave well. |
@puckipedia the code is different from how you had it, very different, but still in the spirit of cooperation if you want the changes feel free to pull them in. My changes don't actually change any behavior IIRC they just make the code imho better. If you do want to pull it in, even partially, and something is unclear feel free to contact me on Matrix and I'll help out. |
Couple of build retries remnants: diff --git a/buildbot_nix/models.py b/buildbot_nix/models.py
index eea11d8..9de66ad 100644
--- a/buildbot_nix/models.py
+++ b/buildbot_nix/models.py
@@ -174,7 +174,6 @@ class PostBuildStep(BaseModel):
class BuildbotNixConfig(BaseModel):
db_url: str
auth_backend: AuthBackendConfig
- build_retries: int
cachix: CachixConfig | None
gitea: GiteaConfig | None
github: GitHubConfig | None
diff --git a/nix/master.nix b/nix/master.nix
index 3bbd0d6..f72bdf7 100644
--- a/nix/master.nix
+++ b/nix/master.nix
@@ -110,12 +110,6 @@ in
'';
};
- buildRetries = lib.mkOption {
- type = lib.types.int;
- default = 1;
- description = "Number of times a build is retried";
- };
-
postBuildSteps = lib.mkOption {
default = [ ];
description = ''
@@ -606,7 +600,6 @@ in
(pkgs.formats.json { }).generate "buildbot-nix-config.json" {
db_url = cfg.dbUrl;
auth_backend = cfg.authBackend;
- build_retries = cfg.buildRetries;
cachix =
if !cfg.cachix.enable then
null |
@zowoq good catch, ill rip those out too. Might be a good idea to enable strict deserialization for pydantic so it wouldnt ignore unknown fields. |
@mergify rebase |
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
…"database" Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
…l-jobs` Signed-off-by: magic_rb <[email protected]>
5ab8704
to
54961b3
Compare
@Mic92 git is failing me here, what changes are you making? :) |
Ah good catch, probably just the props being wrong, ill fix today |
I think this also have been wrong on gitea as it is also show github. |
if repo_name is not None:
name = f"github:{repo_name}#checks.{name}"
else:
name = f"checks.{name}" ah we're missing this bit of code, which was wrong anyway. I'll devise a fix |
@Mic92 can we open issues for follow ups? Cause it is almost impossible to track with all the flake update PRs. They create way too much noise. |
This PR begins pulling in the changes Lix made in regards to the build scheduler. The hope is to make this work, refactor and then do the failed build caching (if this code doesn't do it already, I honestly cannot tell).
The code from Lix will undergo a lot of changes as it is very untyped and in my humble opinion somewhat hard to read and comprehend. The Lix folk are of course free to pull any changes we make back into the Lix fork. Though long term my hope would be that we can once again merge the two forks...