-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test route url fix #66
Conversation
1848288
to
736d9cf
Compare
This issue needs a milestone... is it blocker for 7.0? P1 for 7.1? Also I'm not sure I can review this. Might be better to let @davidfestal review when he's back from PTO next week. |
as to CQs, the rule is basically...
You can search for existing CQs [0] for current/previous versions of your deps here: Looks like logrus 1.4.2 is already in there [1], so you just need to submit a Piggyback CQ for Che [2], like the Codewind team did: [1] https://dev.eclipse.org/ipzilla/show_bug.cgi?id=20621 [2] https://www.eclipse.org/projects/handbook/#pmi-commands-cq If you are adding new versions of deps, or new deps, repeat the above process for each new dep. |
dc390ef
to
410e9d9
Compare
e5c774f
to
ff697b4
Compare
ff697b4
to
db15bdb
Compare
time.Sleep(time.Duration(1) * time.Second) | ||
testRoute := r.GetEffectiveRoute(instance, "test") | ||
requestURL = "https://" + testRoute.Spec.Host | ||
return nil, errors.New("Unable to get test route host for fetching certificate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to be sure that the operator won't restart processing the custom resource immediately, you could also return the following:
return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm not able to return reconcile.Result
without modifying the signature. It seems to be a helper class and return here seems not good, at least there is no such any method in this class.
Do you think it is OK to send reconcile.Result
back though k8s_helper.go#GetEndpointTlsCrt
-> create.go#CreateTLSSecret
-> che_controller.go#Reconcile
?
Or the following fragment in che_controller would be good enough?
if err := r.CreateTLSSecret(instance, "", "self-signed-certificate"); err != nil {
return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err
}
But it would mean that we wait one second on any error, like failed to get namespace, failed to create route, failed to fetch host...
Then why we do not wait 1 second in other places like: when we failed to create service account https://github.com/eclipse/che-operator/blob/9682f3448fe240c216aa22d9d3f56cc056b01494/pkg/controller/che/che_controller.go#L315-L317
BTW Seems that even with return reconcile.Result{}, err
there is ~ 1 second between k8s_helper.go#GetEndpointTlsCrt
invocations
the first one - is the first invocation
2nd and 3rd are the second invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm not able to return reconcile.Result without modifying the signature.
Oh, yes, I didn't pay attention to the fact it was inside a utility function.
You might simply add a retryLater
boolean return value, and use it in the che_controller
reconcile function.
Then why we do not wait 1 second in other places like: when we failed to create service account
Well the existing logic in the controller (that was already there when I started on it) seems to be the following:
- return only an error when it is an unexpected error,
- return a Result (with retry) + error when it's an error that we expect in some scenarios and on which we explicitly want to restart the reconciliation after a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that we should retry if quite unexpected exception occurred like failed to create route
?
It's unexpected by maybe k8s/openshift API was temporary unavailable...
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if the CreateTLSSecret
function returns continueLater
with true
, then the calling reconcile
method should return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err
else it should return nil, err
if an unexpected error occured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I understand it, let me rephrase my question:
CreateTLSSecret may returns error in different situations:
- Failed to get an existing self-signed secret.
- Failed to get existing route (like when API is not available)
- Failed to create new route (like when API is not available)
- The fetched route does not have host (it's clear that we should retry in such case)
- Failed to request route host to retrieve TLS certificate
- Failed to create a secret with fetched TLS self-signed certificate.
Which of this errors you consider as unexpected and which expected (like 4 about route) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me you would only use continueLater=true
in situation 4. We know that in some cases the host can take some time to be added to the route object. It's expected.
Possibly 5 could also be expected (and return continueLater=true
, since sometimes the route may take some time to be setup.
But main point is the 4 IMO
Does it seem consistent to you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this PR brings some fix and does not do code base worse.
Your proposed solution totally makes sense but it's more about refactoring the current architecture but not about fixing the issue I initially tried to use.
Sorry, but I'm not able to invest now in operator architecture.
@davidfestal If you agree that this PR is OK to merge - please approve.
If you think that it brings more issues than solves - feel free to close it.
db15bdb
to
44212ab
Compare
Can one of the admins verify this patch? |
44212ab
to
c2d5aa9
Compare
c2d5aa9
to
713868d
Compare
Signed-off-by: Sergii Leshchenko <[email protected]>
713868d
to
caf68fb
Compare
Signed-off-by: Sergii Leshchenko <[email protected]>
caf68fb
to
418ce99
Compare
Will done as part of refactoring |
The following PR has the following separated changes:
0.11.5
(Mar 14, 2017) to the latest v1.4.2 (May 18, 2019). It makes easier to debug che-operator if needed by enabling method name (more see https://github.com/sirupsen/logrus#logging-method-name)There was a typo in test route assignment since
:=
declare new variable with limited scope, and=
assign new value to exist variable.Also, fetching is improved to make waiting for route host non-blocking.
PB logrus v1.4.2 https://dev.eclipse.org/ipzilla/show_bug.cgi?id=20764