-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging cleanup #11080
Logging cleanup #11080
Conversation
@@ -134,7 +134,7 @@ func (e *executor) RunTasks(taskMap map[string]Task) error { | |||
|
|||
remaining := time.Second * time.Duration(int(time.Until(ts.deadline).Seconds())) | |||
if _, ok := err.(*TryAgainLaterError); ok { | |||
klog.Infof("Task %q not ready: %v", ts.key, err) | |||
klog.V(2).Infof("Task %q not ready: %v", ts.key, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a special case where things fail because something else is not ready due to how AWS works.
Usually you expect the tasks to succeed in the order they are and any retries imply some issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we're dealing with eventual consistency (what TryAgainLaterError seems to imply) then it is within the expected behavior of the system, and therefor isn't worth alerting to the user at the default verbosity levels that it has occurred. If tasks consistently fail or fail for different reasons those situations will still be logged normally.
I just don't think this is a valuable enough piece of information to surface to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say you're probably both right ...
As a user, I don't much care about one or two failures here. I probably do care if it gets "stuck" to know why it's stuck. And I probably also want to know roughly what is going on ("waiting for 2 tasks" ?).
Maybe we should do something different on iteration 1 vs iteration 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a task is stuck then kops will keep retrying it until 10 minutes have elapsed and log a more descriptive error:
kops also logs if no progress has been made on any remaining tasks. I think both of these are sufficient enough to warrant not needing to log the specific task that isn't ready. Users can always -v 2
if they need more detail.
/retest |
klog.V(8).Infof("task %T does not implement HasLifecycle", task) | ||
return task | ||
} | ||
|
||
typeName := TypeNameForTask(task) | ||
klog.V(8).Infof("testing task %q", typeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log line feels very V(8) :-)
@@ -134,7 +134,7 @@ func (e *executor) RunTasks(taskMap map[string]Task) error { | |||
|
|||
remaining := time.Second * time.Duration(int(time.Until(ts.deadline).Seconds())) | |||
if _, ok := err.(*TryAgainLaterError); ok { | |||
klog.Infof("Task %q not ready: %v", ts.key, err) | |||
klog.V(2).Infof("Task %q not ready: %v", ts.key, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say you're probably both right ...
As a user, I don't much care about one or two failures here. I probably do care if it gets "stuck" to know why it's stuck. And I probably also want to know roughly what is going on ("waiting for 2 tasks" ?).
Maybe we should do something different on iteration 1 vs iteration 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this a little more and I think the fix is ok as is for now.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hakman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
After staring at prow job outputs and nodeup logs for a long time, i'm removing some of the unnecessary logging statements.
The first will remove these in the kops cli:
The second will remove these from the nodeup logs: