Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache failed builds #255

Merged
merged 22 commits into from
Sep 12, 2024
Merged

Cache failed builds #255

merged 22 commits into from
Sep 12, 2024

Conversation

MagicRB
Copy link
Contributor

@MagicRB MagicRB commented Aug 12, 2024

This PR begins pulling in the changes Lix made in regards to the build scheduler. The hope is to make this work, refactor and then do the failed build caching (if this code doesn't do it already, I honestly cannot tell).

The code from Lix will undergo a lot of changes as it is very untyped and in my humble opinion somewhat hard to read and comprehend. The Lix folk are of course free to pull any changes we make back into the Lix fork. Though long term my hope would be that we can once again merge the two forks...

@MagicRB MagicRB force-pushed the cache_failed_builds branch from 0cd230b to 61b8bf3 Compare August 12, 2024 19:01
@MagicRB
Copy link
Contributor Author

MagicRB commented Aug 12, 2024

Update, it works 🎉 it needs a lot of cleanup, typing and general refactoring. And I probably broke combined builds, but hey, it works.

@MagicRB
Copy link
Contributor Author

MagicRB commented Aug 13, 2024

Todays progress is good, it type checks now, omits errored evals and skips the already cached ones. Further refactoring will allow to show failed build steps for failed evals. And then a failed build cache will achieve failed build caching.

buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
@Mic92
Copy link
Member

Mic92 commented Aug 14, 2024

@MagicRB Do we still need the combined build if the success status for the buildbot/nix-eval could be used as well?

@MagicRB
Copy link
Contributor Author

MagicRB commented Aug 14, 2024

Thats a separate PR after this one imo, but yea that could be done

@MagicRB
Copy link
Contributor Author

MagicRB commented Aug 15, 2024

I found a bug in the original lix code, seems that job failures weren't properly propagated to dependents,

    failed_checks: list[NixEvalJob] = []
    failed_paths: list[str] = []

need to be initialized to the job that failed or the function will never "recurse" (it's an iterative graph walk which is expressed way more naturally with recursion, but python is pythonic). It appears that I missed this commit when doing the initial cherry pick

buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
@MagicRB
Copy link
Contributor Author

MagicRB commented Aug 16, 2024

image

Now it properly reports failed dependents and failed evaluations.

and the log has been moved into buildbot's UI

image

@Mic92
Copy link
Member

Mic92 commented Aug 16, 2024

Nice.

buildbot_nix/__init__.py Outdated Show resolved Hide resolved
buildbot_nix/__init__.py Outdated Show resolved Hide resolved
@MagicRB MagicRB force-pushed the cache_failed_builds branch 2 times, most recently from 6201c0f to a76ec3c Compare August 19, 2024 19:30


DB_NOT_INIT_MSG = "Database not initialized"

Copy link
Member

@Mic92 Mic92 Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the error above above by adding a database class:

class Database:
  def __init__(db_path: Path): -> None:
      self.handle = dbm.open(str(db_path), "c")

  def add_build(self, derivation: str, time: datetime) -> None:
      self.handle[derivation] = FailedBuild(derivation=derivation, time=time).model_dump_json()
  # ...

Copy link
Member

@Mic92 Mic92 Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall like it would be less code because we don't need to check if there is a database initialized. We just need to pass down the database into the the builds steps where we need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid threading it through for now as we plan to put the data into postgres anyway. And then we'd be accessing it through buildbots machinery hopefully. Treating this as a temporary solution so tried to make it as self contained as possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what you mean by threading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant adding it everywhere only to then remove it later when once we integrate with buildbots database we'll access it through self.master.db so we'd be needlessly adding arguments and then removing them. That was my initial thinking, but tbh we've spent more time talking about it than I would have spent doing it 😂

@MagicRB MagicRB force-pushed the cache_failed_builds branch from a76ec3c to d2896a6 Compare August 19, 2024 20:19
@MagicRB MagicRB marked this pull request as ready for review August 20, 2024 20:03
@MagicRB
Copy link
Contributor Author

MagicRB commented Aug 20, 2024

I think this is ready for review, it works I've tested it thoroughly and it seems to behave well.

@MagicRB
Copy link
Contributor Author

MagicRB commented Aug 20, 2024

@puckipedia the code is different from how you had it, very different, but still in the spirit of cooperation if you want the changes feel free to pull them in. My changes don't actually change any behavior IIRC they just make the code imho better.

If you do want to pull it in, even partially, and something is unclear feel free to contact me on Matrix and I'll help out.

@zowoq
Copy link
Contributor

zowoq commented Aug 22, 2024

Couple of build retries remnants:

diff --git a/buildbot_nix/models.py b/buildbot_nix/models.py
index eea11d8..9de66ad 100644
--- a/buildbot_nix/models.py
+++ b/buildbot_nix/models.py
@@ -174,7 +174,6 @@ class PostBuildStep(BaseModel):
 class BuildbotNixConfig(BaseModel):
     db_url: str
     auth_backend: AuthBackendConfig
-    build_retries: int
     cachix: CachixConfig | None
     gitea: GiteaConfig | None
     github: GitHubConfig | None
diff --git a/nix/master.nix b/nix/master.nix
index 3bbd0d6..f72bdf7 100644
--- a/nix/master.nix
+++ b/nix/master.nix
@@ -110,12 +110,6 @@ in
         '';
       };
 
-      buildRetries = lib.mkOption {
-        type = lib.types.int;
-        default = 1;
-        description = "Number of times a build is retried";
-      };
-
       postBuildSteps = lib.mkOption {
         default = [ ];
         description = ''
@@ -606,7 +600,6 @@ in
                 (pkgs.formats.json { }).generate "buildbot-nix-config.json" {
                   db_url = cfg.dbUrl;
                   auth_backend = cfg.authBackend;
-                  build_retries = cfg.buildRetries;
                   cachix =
                     if !cfg.cachix.enable then
                       null

@MagicRB
Copy link
Contributor Author

MagicRB commented Aug 22, 2024

@zowoq good catch, ill rip those out too. Might be a good idea to enable strict deserialization for pydantic so it wouldnt ignore unknown fields.

@Mic92
Copy link
Member

Mic92 commented Aug 26, 2024

@mergify rebase

MagicRB and others added 18 commits September 12, 2024 16:04
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
Signed-off-by: magic_rb <[email protected]>
@Mic92 Mic92 force-pushed the cache_failed_builds branch from 5ab8704 to 54961b3 Compare September 12, 2024 14:04
@MagicRB
Copy link
Contributor Author

MagicRB commented Sep 12, 2024

@Mic92 git is failing me here, what changes are you making? :)

@Mic92
Copy link
Member

Mic92 commented Sep 12, 2024

@Mic92 git is failing me here, what changes are you making? :)

It's a rebase + 54961b3

@Mic92 Mic92 merged commit 4233701 into nix-community:main Sep 12, 2024
3 checks passed
@MagicRB MagicRB mentioned this pull request Sep 14, 2024
@zowoq
Copy link
Contributor

zowoq commented Sep 20, 2024

current

Just noticed that buildbot UI is missing the project details that was there previously:

original

@MagicRB
Copy link
Contributor Author

MagicRB commented Sep 20, 2024

Ah good catch, probably just the props being wrong, ill fix today

@Mic92
Copy link
Member

Mic92 commented Sep 20, 2024

I think this also have been wrong on gitea as it is also show github.

@MagicRB
Copy link
Contributor Author

MagicRB commented Sep 20, 2024

   if repo_name is not None:
        name = f"github:{repo_name}#checks.{name}"
    else:
        name = f"checks.{name}"

ah we're missing this bit of code, which was wrong anyway. I'll devise a fix

@MagicRB
Copy link
Contributor Author

MagicRB commented Sep 21, 2024

@Mic92 can we open issues for follow ups? Cause it is almost impossible to track with all the flake update PRs. They create way too much noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants