Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging cleanup #11080

Merged
merged 2 commits into from
Apr 15, 2021
Merged

Logging cleanup #11080

merged 2 commits into from
Apr 15, 2021

Conversation

rifelpet
Copy link
Member

After staring at prow job outputs and nodeup logs for a long time, i'm removing some of the unnecessary logging statements.

The first will remove these in the kops cli:

 I0318 08:50:47.113867    2772 executor.go:111] Tasks: 80 done / 82 total; 2 can run
I0318 08:50:48.918239    2772 executor.go:137] Task "AutoscalingGroup/master-ap-northeast-1a.masters.e2e-e6110d5e7d-a98da.test-cncf-aws.k8s.io" not ready: waiting for the IAM Instance Profile to be propagated
I0318 08:50:48.918286    2772 executor.go:137] Task "AutoscalingGroup/nodes-ap-northeast-1a.e2e-e6110d5e7d-a98da.test-cncf-aws.k8s.io" not ready: waiting for the IAM Instance Profile to be propagated
I0318 08:50:48.918297    2772 executor.go:155] No progress made, sleeping before retrying 2 task(s) 

The second will remove these from the nodeup logs:

Mar 19 16:52:10.691883 ip-172-20-37-77.ap-southeast-1.compute.internal cloud-init[2044]: I0319 16:52:10.684448    4573 task.go:103] task *nodetasks.Service does not implement HasLifecycle

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 20, 2021
@@ -134,7 +134,7 @@ func (e *executor) RunTasks(taskMap map[string]Task) error {

remaining := time.Second * time.Duration(int(time.Until(ts.deadline).Seconds()))
if _, ok := err.(*TryAgainLaterError); ok {
klog.Infof("Task %q not ready: %v", ts.key, err)
klog.V(2).Infof("Task %q not ready: %v", ts.key, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a special case where things fail because something else is not ready due to how AWS works.
Usually you expect the tasks to succeed in the order they are and any retries imply some issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're dealing with eventual consistency (what TryAgainLaterError seems to imply) then it is within the expected behavior of the system, and therefor isn't worth alerting to the user at the default verbosity levels that it has occurred. If tasks consistently fail or fail for different reasons those situations will still be logged normally.

I just don't think this is a valuable enough piece of information to surface to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say you're probably both right ...

As a user, I don't much care about one or two failures here. I probably do care if it gets "stuck" to know why it's stuck. And I probably also want to know roughly what is going on ("waiting for 2 tasks" ?).

Maybe we should do something different on iteration 1 vs iteration 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a task is stuck then kops will keep retrying it until 10 minutes have elapsed and log a more descriptive error:

https://github.com/kubernetes/kops/pull/11080/files#diff-d73723c1c245d32c779df66e821830949e518a324bd48cfcf8299ef00a2194dfR105

kops also logs if no progress has been made on any remaining tasks. I think both of these are sufficient enough to warrant not needing to log the specific task that isn't ready. Users can always -v 2 if they need more detail.

@rifelpet
Copy link
Member Author

/retest

klog.V(8).Infof("task %T does not implement HasLifecycle", task)
return task
}

typeName := TypeNameForTask(task)
klog.V(8).Infof("testing task %q", typeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line feels very V(8) :-)

@@ -134,7 +134,7 @@ func (e *executor) RunTasks(taskMap map[string]Task) error {

remaining := time.Second * time.Duration(int(time.Until(ts.deadline).Seconds()))
if _, ok := err.(*TryAgainLaterError); ok {
klog.Infof("Task %q not ready: %v", ts.key, err)
klog.V(2).Infof("Task %q not ready: %v", ts.key, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say you're probably both right ...

As a user, I don't much care about one or two failures here. I probably do care if it gets "stuck" to know why it's stuck. And I probably also want to know roughly what is going on ("waiting for 2 tasks" ?).

Maybe we should do something different on iteration 1 vs iteration 10?

Copy link
Member

@hakman hakman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this a little more and I think the fix is ok as is for now.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2021
@k8s-ci-robot k8s-ci-robot merged commit 465a02f into kubernetes:master Apr 15, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Apr 15, 2021
@rifelpet rifelpet deleted the logging-cleanup branch May 5, 2021 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants