Logging cleanup #11080

rifelpet · 2021-03-20T04:31:34Z

After staring at prow job outputs and nodeup logs for a long time, i'm removing some of the unnecessary logging statements.

The first will remove these in the kops cli:

 I0318 08:50:47.113867    2772 executor.go:111] Tasks: 80 done / 82 total; 2 can run
I0318 08:50:48.918239    2772 executor.go:137] Task "AutoscalingGroup/master-ap-northeast-1a.masters.e2e-e6110d5e7d-a98da.test-cncf-aws.k8s.io" not ready: waiting for the IAM Instance Profile to be propagated
I0318 08:50:48.918286    2772 executor.go:137] Task "AutoscalingGroup/nodes-ap-northeast-1a.e2e-e6110d5e7d-a98da.test-cncf-aws.k8s.io" not ready: waiting for the IAM Instance Profile to be propagated
I0318 08:50:48.918297    2772 executor.go:155] No progress made, sleeping before retrying 2 task(s)

The second will remove these from the nodeup logs:

Mar 19 16:52:10.691883 ip-172-20-37-77.ap-southeast-1.compute.internal cloud-init[2044]: I0319 16:52:10.684448    4573 task.go:103] task *nodetasks.Service does not implement HasLifecycle

hakman · 2021-03-20T05:47:03Z

upup/pkg/fi/executor.go

@@ -134,7 +134,7 @@ func (e *executor) RunTasks(taskMap map[string]Task) error {

 				remaining := time.Second * time.Duration(int(time.Until(ts.deadline).Seconds()))
 				if _, ok := err.(*TryAgainLaterError); ok {
-					klog.Infof("Task %q not ready: %v", ts.key, err)
+					klog.V(2).Infof("Task %q not ready: %v", ts.key, err)


This is a special case where things fail because something else is not ready due to how AWS works.
Usually you expect the tasks to succeed in the order they are and any retries imply some issue.

I think if we're dealing with eventual consistency (what TryAgainLaterError seems to imply) then it is within the expected behavior of the system, and therefor isn't worth alerting to the user at the default verbosity levels that it has occurred. If tasks consistently fail or fail for different reasons those situations will still be logged normally.

I just don't think this is a valuable enough piece of information to surface to the user.

I'd say you're probably both right ...

As a user, I don't much care about one or two failures here. I probably do care if it gets "stuck" to know why it's stuck. And I probably also want to know roughly what is going on ("waiting for 2 tasks" ?).

Maybe we should do something different on iteration 1 vs iteration 10?

if a task is stuck then kops will keep retrying it until 10 minutes have elapsed and log a more descriptive error:

https://github.com/kubernetes/kops/pull/11080/files#diff-d73723c1c245d32c779df66e821830949e518a324bd48cfcf8299ef00a2194dfR105

kops also logs if no progress has been made on any remaining tasks. I think both of these are sufficient enough to warrant not needing to log the specific task that isn't ready. Users can always -v 2 if they need more detail.

rifelpet · 2021-03-20T15:57:59Z

/retest

justinsb · 2021-03-20T20:42:09Z

upup/pkg/fi/task.go

-		klog.V(8).Infof("task %T does not implement HasLifecycle", task)
-		return task
-	}
-
 	typeName := TypeNameForTask(task)
 	klog.V(8).Infof("testing task %q", typeName)


This log line feels very V(8) :-)

justinsb · 2021-03-20T20:52:16Z

upup/pkg/fi/executor.go

@@ -134,7 +134,7 @@ func (e *executor) RunTasks(taskMap map[string]Task) error {

 				remaining := time.Second * time.Duration(int(time.Until(ts.deadline).Seconds()))
 				if _, ok := err.(*TryAgainLaterError); ok {
-					klog.Infof("Task %q not ready: %v", ts.key, err)
+					klog.V(2).Infof("Task %q not ready: %v", ts.key, err)


I'd say you're probably both right ...

As a user, I don't much care about one or two failures here. I probably do care if it gets "stuck" to know why it's stuck. And I probably also want to know roughly what is going on ("waiting for 2 tasks" ?).

Maybe we should do something different on iteration 1 vs iteration 10?

hakman

Thought about this a little more and I think the fix is ok as is for now.

k8s-ci-robot · 2021-04-15T08:46:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hakman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rifelpet added 2 commits March 19, 2021 23:28

Fix copy/paste typo

14bbcec

Only log HasLifecycle details if it is worthy of a warning

7f32b2a

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 20, 2021

k8s-ci-robot requested review from johngmyers and KashifSaadat March 20, 2021 04:31

hakman reviewed Mar 20, 2021

View reviewed changes

justinsb reviewed Mar 20, 2021

View reviewed changes

hakman approved these changes Apr 15, 2021

View reviewed changes

k8s-ci-robot assigned hakman Apr 15, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2021

k8s-ci-robot merged commit 465a02f into kubernetes:master Apr 15, 2021

k8s-ci-robot added this to the v1.21 milestone Apr 15, 2021

rifelpet deleted the logging-cleanup branch May 5, 2021 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging cleanup #11080

Logging cleanup #11080

rifelpet commented Mar 20, 2021

hakman Mar 20, 2021

rifelpet Mar 20, 2021

justinsb Mar 20, 2021

rifelpet Mar 20, 2021

rifelpet commented Mar 20, 2021

justinsb Mar 20, 2021

justinsb Mar 20, 2021

hakman left a comment

k8s-ci-robot commented Apr 15, 2021

Logging cleanup #11080

Logging cleanup #11080

Conversation

rifelpet commented Mar 20, 2021

hakman Mar 20, 2021

Choose a reason for hiding this comment

rifelpet Mar 20, 2021

Choose a reason for hiding this comment

justinsb Mar 20, 2021

Choose a reason for hiding this comment

rifelpet Mar 20, 2021

Choose a reason for hiding this comment

rifelpet commented Mar 20, 2021

justinsb Mar 20, 2021

Choose a reason for hiding this comment

justinsb Mar 20, 2021

Choose a reason for hiding this comment

hakman left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 15, 2021