Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitRepositoryReconciler no-op clone improvements #724

Merged
merged 7 commits into from
May 24, 2022
Merged

Conversation

darkowlzz
Copy link
Contributor

@darkowlzz darkowlzz commented May 19, 2022

The no-op git clone implementation resulted in some bugs in the GitRepository reconciler due to the way it's currently integrated in the reconciliation loop. Skipping the reconciliation in the middle based on no update in the remote repository ignores the other factors that contribute to the source artifact like recursive submodules, ignore rules and included repositories, ignore rules, included repositories, etc. This causes inconsistent behavior and sometimes persistent reconciliation failures.

This change restructures the GitRepository reconciler to be able to correctly determine if the reconciliation can be skipped and updates all the other components affected by this change in behavior like the reconciliation result builder and processor, recovery notifications, etc.

Errors

It introduces a new error configurations for the reconciliation error abstractions to add action information in those errors. There are both contextual and actionable errors in reconciliation errors. Certain scenarios require an error to be both contextual and actionable. The error config helps achieve that by embedding action related configurations in the errors. Using this, a user of an error can configure if the error should result in logging or event notification or just ignored. For ease of use, the reconciliation errors now have constructors with default configurations for their expected behavior in most of the cases. Some of the error configurations may not always be effectual for contextual errors due to the contextual meaning. For example, the Stalling contextual error with ignore option should not result in the error being ignored. For ignorable no-op error result, a non-contextual error is more appropriate.

It also introduces a Generic error type which can be used to replace the existing non-contextual errors like the Event error. Generic error can be configured to replace Event error and also add more custom behaviors based on the requirement of the caller.

Error action handler

Since the error configurations now allow define action based configurations in the errors, a new reconciliation result processor ErrorActionHancler is introduced to handle the error configurations accordingly. It implements the result processor and can replace the existing error processors.

Runtime results

The runtime result builder has been updated to identify Generic errors which may have an ignore option for a no-op result.
Generic ignorable errors do result in status observed generation to be updated.

Git packages

The Git implementations have been updated to return a partial commit when it's a no-op clone. The partial commit contains only the hash and reference of a commit, since we don't have full commit information without cloning again. A concrete commit contains all the information related to a commit.

GitRepository Reconciler

The GitRepositoryReconciler's reconcileSource is updated to fetch all the information required to determine if the reconciliation can be skipped. This includes fetching the metadata of all the included repositories and comparing them with the previously seen included repositories.

A new status field contentConfigChecksum is introduced to calculate a checksum of all the configuration values that contribute to a change in the source artifact.

When no-op git clone is enabled, the remote repo is checked for an update and a partial commit is returned if there's no update. If the contentConfigChecksum hasn't changed, the reconciliation is skipped. But if the contentConfigChecksum has changed, another git checkout is performed for a concrete commit to perform a full reconciliation to build a new artifact.

All the Event error in the reconciler are replaced with Generic error. Other reconcilers can adopt Generic error gradually, both the errors work for now.

All the GitRepositoryReconciler tests have been updated to use the feature gates.

NOTE: This change updates the GitRepository CRD with a new field which should be updated before any testing.

@darkowlzz darkowlzz added area/git Git related issues and pull requests enhancement New feature or request labels May 19, 2022
@pjbgf pjbgf added this to the GA milestone May 19, 2022
@darkowlzz darkowlzz force-pushed the gitrepo-rec-fixes-2 branch from 4ee6088 to e0a121c Compare May 19, 2022 09:03
internal/error/error.go Outdated Show resolved Hide resolved
internal/error/error.go Outdated Show resolved Hide resolved
internal/error/error.go Outdated Show resolved Hide resolved
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darkowlzz great stuff again! Just found a few small nits here and there.

@darkowlzz darkowlzz force-pushed the gitrepo-rec-fixes-2 branch from e0a121c to 1b929a5 Compare May 19, 2022 14:23
controllers/gitrepository_controller.go Outdated Show resolved Hide resolved
controllers/gitrepository_controller.go Outdated Show resolved Hide resolved
internal/error/error.go Outdated Show resolved Hide resolved
internal/error/error.go Show resolved Hide resolved
@souleb
Copy link
Member

souleb commented May 20, 2022

tested this in cluster while playing with .spec.ignore. I can confirm it is taken into account during reconciliation.

darkowlzz added 7 commits May 20, 2022 19:52
Generic error is an attempt to avoid creating new error type for every
new unique scenario. It can be used to configure and build custom error
handling behavior, logging and event recording at present.
Contextual errors, Stalling and Waiting error, have special meaning for
the reconciliation results. But the Event error type can be replaced
with Generic error with some specific configurations. The Event error
is kept for a gradual migation to Generic error. Similarly, the Generic
error can be used to easily create new error handling behaviors.

The error Config can be used to configure any of the errors, including
contextual errors, without altering their contextual meaning, to modify
how they are handled.

The error constructors configure the errors with common default
configurations. These configurations can be modified to alter the
behavior.

Signed-off-by: Sunny <[email protected]>
ErrorActionHandler processes the reconciliation error results based on
their configurations. It performs actions like logging and event
recording based on the error configuration. More actions can be
accommodated in the future with more error configurations.

It can be a replacement for RecordContextualError() which does the same
operations but can't be configured much.

Signed-off-by: Sunny <[email protected]>
Add Generic error in RuntimeResultBuilder and ComputeReconcileResult
implementation with consideration to the error configurations.

Safeguards are added in the runtime result builder to ensure default
requeue after interval is set when is's set to zero or unset.

Signed-off-by: Sunny <[email protected]>
For gradual migration to Generic error, update only the GitRepo
reconciler to use Generic error.

Replace the Waiting error for git no change scenario with a Generic
error with proper no-op, early return, error configurations. This
ensures that the no-op only results in log and K8s native events at
normal level.

Fixes a reconciliation issue when recovering from a failure state (with
previous success state and artifact in the storage) and optimized git
clone feature is on, which results in failure to persist as the git
optimization prevented full reconciliation due to already existing
artifact and removal of failure negative conditions on the object
status. In order to allow failure recovery, the git clone optimizations
are now only applied when the object is already in a ready state.

Signed-off-by: Sunny <[email protected]>
Introduce a new field in the GitRepositoryReconciler to set the enabled
features. This makes it test friendly compared to using global flags for
setting and checking flags in the tests.

Enable default feature gates in all the GitRepo reconciler tests.

Add test cases for reconcileSource() to test the behavior of optimized
git clone when the Repo is ready and not ready. This ensures that the
full reconciliation is not skipped when GitRepo is not ready.

Signed-off-by: Sunny <[email protected]>
Introduce concrete and partial commits. Concrete commits have all the
information from remote including the hash and commit content. Partial
commits are based on locally available copy of a repo, they may only
contain the commit hash and reference.

IsConcreteCommit() can be used to find out if a given commit is based on
local information or full remote repo information.

Update go-git and libgit2 branch/tag clone optimization to return a
partial commit and no error.

Update and simplify the go-git and libgit2 tests for the same.

Signed-off-by: Sunny <[email protected]>
Introduce contentConfigChecksum in the GitRepository.Status to track the
configurations that affect the content of the artifact. It is used to
detect a change in the configuration that requires rebuilding the whole
artifact. This helps skip the reconciliation early when we find out that
the remote repository has not changed.

Moves fetching the included repositories in reconcileSource() to collect
enough information in reconcileSource() to be able to decide if the full
reconciliation can be skipped. This results in reconcileInclude() to
just copy artifact to the source build directory.

Introduce a gitCheckout() method to perform construction of all the git
checkout options and perform the checkout operation. This helps to
easily perform checkout multiple times when we need it in
reconcileSource(). When we check with the remote repository if there's
an update, and find out that there's no update, we check if any other
configurations that affect the source content has changed, like
includes, ignore rules, etc. If there's a change, we need to perform a
full checkout of the remote repository in order to fetch the complete
source. The git checkout no-op optimization is enabled in this method
based on the presence of an artifact in the storage.

The failure notification handler is modifed to handle the recovery of a
no-op reconcile failure and create a notification message accordingly
with the partial commit.

Signed-off-by: Sunny <[email protected]>
@darkowlzz darkowlzz force-pushed the gitrepo-rec-fixes-2 branch from 1b929a5 to 581695b Compare May 20, 2022 14:22
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darkowlzz great work! thanks for making no-op clones feasible! 🙇 🙇‍♂️ 🙇

LGTM

Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks solid to me, thanks @darkowlzz 🙇

@darkowlzz darkowlzz mentioned this pull request Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/git Git related issues and pull requests enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants