Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Georgettica/bump external snapshotter version (fixes #2966) #3202

Conversation

georgettica
Copy link
Contributor

@georgettica georgettica commented Dec 23, 2020

fixes #2966

@georgettica georgettica force-pushed the georgettica/bump-external-snapshotter-version branch from 08a122d to 1eeb100 Compare December 23, 2020 09:27
@georgettica
Copy link
Contributor Author

yay! all checks passed :) now I need to find a way to test this did what is was supposed to..

suggestions?

go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
Copy link
Contributor

@zubron zubron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this! I think the replace directives will need to be updated. It looks like those directives were introduced to work around the fact that the CSI snapshotter library imported the Kubernetes libraries directly (see #2651). It may be the case that we can remove them with this change.

Either way, I think there will be more changes needed to this PR as v0.19.0 version isn't being used yet and once it is, it will require other libraries to be updated and changes in the Velero codebase.

go.mod Outdated Show resolved Hide resolved
@georgettica
Copy link
Contributor Author

After having a slack talk I have some bad news..

One of the dependant libraries we need I. Order to make this change are planning to upgrade in Feb

If someone has better insight on it I would gladly push our upgrade to a nearer target
https://kubernetes.slack.com/archives/C8TSNPY4T/p1609798576313300?thread_ts=1609798576.313300&cid=C8TSNPY4T

@zubron
Copy link
Contributor

zubron commented Jan 5, 2021

@georgettica Thanks so much for reaching out to the Cluster API folks and checking the status of their releases! 😄

Summarising the thread here: It looks like the release with this change will be their next minor release (v0.4.0) and that likely won't be available until end of Q1. Given that we just import it to use the patch helper they are considering making it available in controller-runtime but that will require refactoring to make it less specific to cluster API.

@vmware-tanzu/velero-maintainers: It seems our main options are to use an unreleased version of the library or hold off until the official release is available. We already use pre-release versions of other libraries so I think it should be fine to do the same here?

@jenting
Copy link
Contributor

jenting commented Jan 6, 2021

I'm 👍 to use the pre-release version.

@georgettica
Copy link
Contributor Author

I'll look into changing the PR in accordance with your changes.

I am glad to reach to the community when I can XD

@github-actions github-actions bot requested a review from zubron January 7, 2021 08:50
@georgettica
Copy link
Contributor Author

so with the help of ya'll I moved it forward a bit more..

there is a build issue, caused by this change in controller-runtime
kubernetes-sigs/controller-runtime@8d45507

I am not sure what to change the function to, so I'll let the sages here look at it

@jenting
Copy link
Contributor

jenting commented Jan 13, 2021

so with the help of ya'll I moved it forward a bit more..

there is a build issue, caused by this change in controller-runtime
kubernetes-sigs/controller-runtime@8d45507

I am not sure what to change the function to, so I'll let the sages here look at it

@carlisia might know how to change the function to fulfill controller-runtime API change.
The build fails at https://github.com/vmware-tanzu/velero/blob/4048c02/internal/util/managercontroller/managercontroller.go#L32-L53

@nrb
Copy link
Contributor

nrb commented Jan 22, 2021

Based on https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/manager.go#L259, it looks like the Runnable interface now takes a Context rather than a <-chan struct{} channel.

I made the following change locally to the server.go and managercontroller.go files and was able to compile.

I tried to push this commit to George's fork, but I don't have permissions. I've included it here as a patch file, and it can be applied with git am < <filename>, if you want to use it @georgettica.

Thank you tons for digging in to this, I appreciate it!

From e0056830f1eaef8362d5e382b10afe372bf070ba Mon Sep 17 00:00:00 2001
From: Nolan Brubaker <[email protected]>
Date: Thu, 21 Jan 2021 21:19:15 -0500
Subject: [PATCH] Use Context directly on the Runnable interface

New versions of controller-runtime use a Context directly rather than a
channel.

Signed-off-by: Nolan Brubaker <[email protected]>
---
 .../managercontroller/managercontroller.go    | 21 ++-----------------
 pkg/cmd/server/server.go                      |  2 +-
 2 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/internal/util/managercontroller/managercontroller.go b/internal/util/managercontroller/managercontroller.go
index 5a009d7f..be7391e2 100644
--- a/internal/util/managercontroller/managercontroller.go
+++ b/internal/util/managercontroller/managercontroller.go
@@ -29,25 +29,8 @@ import (
 // Runnable will turn a "regular" runnable component (such as a controller)
 // into a controller-runtime Runnable
 func Runnable(p controller.Interface, numWorkers int) manager.Runnable {
-	f := func(stop <-chan struct{}) error {
-
-		// Create a cancel context for handling the stop signal.
-		ctx, cancel := context.WithCancel(context.Background())
-		defer cancel()
-
-		// If a signal is received on the stop channel, cancel the
-		// context. This will propagate the cancel into the p.Run
-		// function below.
-		go func() {
-			select {
-			case <-stop:
-				cancel()
-			case <-ctx.Done():
-			}
-		}()
-
-		// This is a blocking call that either completes
-		// or is cancellable on receiving a stop signal.
+	// Pass the provided Context down to the run function.
+	f := func(ctx context.Context) error {
 		return p.Run(ctx, numWorkers)
 	}
 	return manager.RunnableFunc(f)
diff --git a/pkg/cmd/server/server.go b/pkg/cmd/server/server.go
index a84c9a67..5cee3900 100644
--- a/pkg/cmd/server/server.go
+++ b/pkg/cmd/server/server.go
@@ -871,7 +871,7 @@ func (s *server) runControllers(defaultVolumeSnapshotLocations map[string]string
 
 	s.logger.Info("Server starting...")
 
-	if err := s.mgr.Start(s.ctx.Done()); err != nil {
+	if err := s.mgr.Start(s.ctx); err != nil {
 		s.logger.Fatal("Problem starting manager", err)
 	}
 
-- 
2.24.0

@nrb
Copy link
Contributor

nrb commented Jan 22, 2021

@georgettica Also, I see the DCO check is failing - https://github.com/vmware-tanzu/velero/pull/3202/checks?check_run_id=1661384030. Only the 3 latest commits need the signoff from what I can tell.

@georgettica
Copy link
Contributor Author

Saw it now, and sad you didn't have permissions to push :(

Will look at it on Sunday and will for sure put these changes in.

Thanks!!

@arianitu
Copy link

Hey @georgettica , just checking on on this PR. It's blocking our builds since we use the velero code directly, any chance we can get this merged?

@georgettica
Copy link
Contributor Author

@arianitu Not yet, as the build is broken.
But there is a fix to solve the next issue I had in line.

I am not working on this full time and will update when I get to a new breakthrough.

@georgettica georgettica force-pushed the georgettica/bump-external-snapshotter-version branch from f26adb7 to f684637 Compare January 26, 2021 09:51
@georgettica
Copy link
Contributor Author

all checks are now passing!
now I need to fix all of this mess I did with my commits and we can go forward

@georgettica
Copy link
Contributor Author

tests are failing.. will see what is the matter there next

@georgettica georgettica force-pushed the georgettica/bump-external-snapshotter-version branch 2 times, most recently from b761370 to 1a9aca1 Compare January 26, 2021 10:29
@georgettica
Copy link
Contributor Author

after some git-fu, all the commits are now on uptream/main and I will fix the tests

@georgettica georgettica force-pushed the georgettica/bump-external-snapshotter-version branch from 64c5d84 to 0c041cf Compare January 26, 2021 10:50
@georgettica
Copy link
Contributor Author

@zubron I think it works now.. right?

@georgettica georgettica force-pushed the georgettica/bump-external-snapshotter-version branch 2 times, most recently from 6a2dc54 to 57db4e0 Compare January 26, 2021 10:57
now versions are working and there are code changes that need to happen

- release candidate versions are aligned and working
- replaces fields are removed and not required anymore

controller runtime has been changed during the 'make' command

Signed-off-by: Ron Green <[email protected]>
Signed-off-by: Nolan Brubaker <[email protected]>
Signed-off-by: Ron Green <[email protected]>
- change to new api resource

not all tests are passing, but most of them do

Signed-off-by: Ron Green <[email protected]>
@georgettica georgettica force-pushed the georgettica/bump-external-snapshotter-version branch from 8c0adea to a27824d Compare January 26, 2021 11:06
@georgettica
Copy link
Contributor Author

georgettica commented Jan 26, 2021

the lint check here is the issue I have now

also the CI check that fails on the test target

Copy link
Contributor

@zubron zubron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your continued efforts on this, @georgettica!

This is looking great so far. There is only one further change needed which is to fix the controller tests. As the controller runtime manager now takes a context, we should update this test environment fixture to use a context and then pass that context into t.Manager.Start. I think it should be enough then to just mark that context as Done in the stop function.

I also suggested upgrading the Kubernetes libraries to the latest available patch version but if that turns out to be a lot of effort, you can leave them at the version you're currently using and we'll address the upgrade in #3340.

go.mod Outdated Show resolved Hide resolved
@@ -52,7 +52,7 @@ type BackupStorageLocationReconciler struct {

// +kubebuilder:rbac:groups=velero.io,resources=backupstoragelocations,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=velero.io,resources=backupstoragelocations/status,verbs=get;update;patch
func (r *BackupStorageLocationReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
func (r *BackupStorageLocationReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can address this in a follow up PR, but I think that now that Reconcile takes a context, we can upgrade our existing controllers to not have their own context but instead use this passed in context. Looking at the code from controller-runtime, this is the context that is now passed in to s.mgr.Start in server.go. I'll open an issue to address this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #3353

@github-actions github-actions bot requested a review from zubron January 26, 2021 18:21
this fix is from my understanding of the context package, can be fixed later

Signed-off-by: Ron Green <[email protected]>
@georgettica
Copy link
Contributor Author

I think I have a solution, hope it works as I expect

@georgettica
Copy link
Contributor Author

georgettica commented Jan 26, 2021

EEEverything passes ❤️

Copy link
Contributor

@zubron zubron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the fixes! LGTM 🎉

For those interested, I tested this out by creating a small Go project which imports the Velero package. When including require github.com/vmware-tanzu/velero v1.5.3, I get the error about the missing version: k8s.io/[email protected]: reading k8s.io/api/go.mod at revision v0.0.0: unknown revision v0.0.0. If I add a replace directive to use the branch from this PR (replace github.com/vmware-tanzu/velero => github.com/georgettica/velero v1.5.2-0.20210126184033-cba96280fdc3), it builds correctly without error.

@georgettica
Copy link
Contributor Author

Yes!! Thanks :)

[Non-blocker] Can this go.mod be incorporated I to the repo u til everyone has transitioned to working with velero directly?

@zubron
Copy link
Contributor

zubron commented Jan 26, 2021

[Non-blocker] Can this go.mod be incorporated I to the repo u til everyone has transitioned to working with velero directly?

Sorry, I'm not sure what you mean by this? Once this PR is merged, it will be available to use, although it won't be an official release. We're working on v1.6 at the moment which will include this fix, so it will be available as an official release then. If people want to use it before that, then they can use versions from the main branch.

Copy link
Contributor

@jenting jenting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

@jenting jenting merged commit e4b6f94 into vmware-tanzu:main Jan 26, 2021
@georgettica georgettica deleted the georgettica/bump-external-snapshotter-version branch January 26, 2021 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot install velero with go get + go install
5 participants