Warning message is confusing when pod logs cannot be retrieved #3711

jiezhang · 2020-05-08T02:03:50Z

What steps did you take:

After the pod finishes successfully and is later reclaimed, the following warning is displayed. The message often confuses first-time users, and suggests them to check out the troubleshooting guide.

Warning: failed to retrieve pod logs. Possible reasons include cluster autoscaling or pod preemption

What happened:

In fact, the logs can be viewed in Stackdriver Kubernetes Monitoring.

What did you expect to happen:

Remove the warning message.

/kind bug
/area frontend

Bobgy · 2020-05-15T06:02:43Z

Thanks for the suggestion!
Sounds reasonable to me.

We can only show troubleshooting guide when there is an error, but not a warning.

jonasdebeukelaer · 2020-05-23T10:00:08Z

happy to fix this
/assign @jonasdebeukelaer

jonasdebeukelaer · 2020-05-23T12:11:53Z

oh wait is this already done? i.e. just removing 'troubleshooting guide'?

Bobgy · 2020-05-25T01:29:03Z

@jonasdebeukelaer Thanks for offering help!
This still needs to be done.

Some helpful information for contribution:

frontend contribution guide: https://github.com/kubeflow/pipelines/tree/master/frontend
Banner component (that shows the troubleshooting link): https://github.com/kubeflow/pipelines/blob/master/frontend/src/components/Banner.tsx
Run Details Page's log viewer tab's banner:

pipelines/frontend/src/pages/RunDetails.tsx

Line 472 in e52481a

<Banner

My suggested UX would be to hide the troubleshooting link when the given banner is a warning (it still shows it when the banner is error), but you can take a look and decide if that feels reasonable to you. It already supports hiding the link on ad-hoc usage: https://github.com/kubeflow/pipelines/blob/master/frontend/src/components/Banner.tsx#L72, so we can also dynamically configure it on usages.

jiezhang · 2020-05-27T16:54:29Z

@Bobgy @jonasdebeukelaer I wonder if it is okay to remove the message completely, or at least lower the level to informational (without the exclamation mark and "Warning" prefix).

Bobgy · 2020-07-31T02:26:30Z

/reopen
I just tested on 1.0.0 and the problem still exists.

Now it shows fail to retrieve pod logs with error message:

Error response: Could not get main container logs: Error: Unable to retrieve workflow status: [object Object].

We didn't accommodate for the case when workflow was also missing.

k8s-ci-robot · 2020-07-31T02:26:34Z

@Bobgy: Reopened this issue.

In response to this:

/reopen
I just tested on 1.0.0 and the problem still exists.

Now it shows fail to retrieve pod logs with error message:
Error response: Could not get main container logs: Error: Unable to retrieve workflow status: [object Object].
We didn't accommodate for the case when workflow was also missing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Bobgy · 2020-07-31T02:27:05Z

@jonasdebeukelaer do you want to revisit this?
Or I can follow up too

jonasdebeukelaer · 2020-08-26T14:19:40Z

hey @Bobgy should be a quick one to fix so happy to do it. In what situations can a workflow be missing?

Bobgy · 2020-08-26T23:47:16Z

When user configures a TTL to GC workflows. (We have a default TTL of 1day)

In fact, workflow status should be persisted into run details DB rows. So UI shouldn't need to fetch the workflow

jonasdebeukelaer · 2020-09-04T15:02:51Z

hmm makes sense 👍

Ark-kun · 2020-10-12T02:54:14Z

@Bobgy The logs are already persisted to the storage the same way as other artifacts. AFAIK, @eterna2 added support to show these logs in the UX when the pod is not available, but this option is turned off by default. Maybe you can enable this option?

Bobgy · 2020-10-12T09:58:03Z

@Ark-kun this bug: #3711 (comment) must be fixed before logs can be reused from archive.

haydnkeung · 2020-12-18T00:15:55Z

@Ark-kun How do you enable the option?

ConverJens · 2021-01-12T08:33:11Z

@Ark-kun @Bobgy Is there any update to this? How do you enable log persistence?

Bobgy · 2021-01-14T08:40:35Z

No update yet, we need someone from community to fix this problem.

For us, we are on GCP, so GCP stackdriver auto persists all Kubernetes pod logs.

ConverJens · 2021-01-14T08:49:52Z

@haydnkeung I managed to enable logs persistence.

Check the configmap workflow-controller-configmap and see if archiveLogs: true is set. For me it wasn't, even though I'm on KF 1.1, and I had to set it in the config-map.yaml found in your manifest dir under argo/base.

stale · 2021-06-03T20:50:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ra312 · 2021-07-01T06:53:48Z

@haydnkeung I managed to enable logs persistence.

Check the configmap workflow-controller-configmap and see if archiveLogs: true is set. For me it wasn't, even though I'm on KF 1.1, and I had to set it in the config-map.yaml found in your manifest dir under argo/base.

Dear @ConverJens , did you restart ml-pipeline-ui (kubeflow UX) after editing configmap?

ConverJens · 2021-07-05T12:52:46Z

@ra312 I actually redeployed all ml-pipieline components apart from Minio and Mysql so I don't know which is required. However, I don't think the UI has anythinh to do with this but rather it's the api server that needs restarting. The UI will simply pick up the logs as any other artifact.

ra312 · 2021-07-13T12:42:30Z

Thanks, @ConverJens! I will try to do the same.

rohitgujral · 2021-10-08T07:06:52Z

@ConverJens @ra312 I'm also trying to persist the pipeline pod logs so that if pod gets deleted then logs should be available to the pipeline runs.
I added archiveLogs: true to the argo/base config-map and restarted pipelines deployment but still after deleting the pod, im not seeing logs.

Is there any other step which needs to be done ?
kubeflow version - 1.0.2 and argo version - 2.3.0

ConverJens · 2021-10-08T07:21:51Z

@rohitgujral The logs tab will only be populated until the pod is removed. The complete logs will be available as a tar.gz under artifacts instead, called main-logs.tar.gz I think.

Note that while the logs tab will always hold the full log, the artifact can lose the final part in the event of your component crashing.I believe that this has to do with the logs not being fully flushed in some instances and in that case, only the logs up the point of error is available.

stale · 2022-03-02T18:04:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ra312 · 2022-04-01T01:13:03Z

closing since the issue has been reported as fixed in #3848

ra312 · 2022-04-01T01:13:09Z

/close

google-oss-prow · 2022-04-01T01:13:16Z

@ra312: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/bug area/frontend labels May 8, 2020

Bobgy added help wanted The community is welcome to contribute. status/triaged Whether the issue has been explicitly triaged labels May 15, 2020

Bobgy self-assigned this May 15, 2020

Bobgy added the priority/p1 label May 15, 2020

k8s-ci-robot assigned jonasdebeukelaer May 23, 2020

jonasdebeukelaer mentioned this issue May 26, 2020

Fix/log unavailable warning #3848

Merged

k8s-ci-robot closed this as completed in #3848 Jun 17, 2020

k8s-ci-robot reopened this Jul 31, 2020

Bobgy unassigned Bobgy and jonasdebeukelaer Jan 14, 2021

Bobgy added the good first issue label Jan 14, 2021

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 3, 2021

stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jul 1, 2021

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 2, 2022

google-oss-prow bot closed this as completed Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning message is confusing when pod logs cannot be retrieved #3711

Warning message is confusing when pod logs cannot be retrieved #3711

jiezhang commented May 8, 2020 •

edited

Loading

Bobgy commented May 15, 2020 •

edited

Loading

jonasdebeukelaer commented May 23, 2020

jonasdebeukelaer commented May 23, 2020

Bobgy commented May 25, 2020

jiezhang commented May 27, 2020

Bobgy commented Jul 31, 2020

k8s-ci-robot commented Jul 31, 2020

Bobgy commented Jul 31, 2020

jonasdebeukelaer commented Aug 26, 2020

Bobgy commented Aug 26, 2020

jonasdebeukelaer commented Sep 4, 2020

Ark-kun commented Oct 12, 2020

Bobgy commented Oct 12, 2020

haydnkeung commented Dec 18, 2020

ConverJens commented Jan 12, 2021

Bobgy commented Jan 14, 2021

ConverJens commented Jan 14, 2021

stale bot commented Jun 3, 2021

ra312 commented Jul 1, 2021 •

edited

Loading

ConverJens commented Jul 5, 2021

ra312 commented Jul 13, 2021

rohitgujral commented Oct 8, 2021

ConverJens commented Oct 8, 2021

stale bot commented Mar 2, 2022

ra312 commented Apr 1, 2022

ra312 commented Apr 1, 2022

google-oss-prow bot commented Apr 1, 2022

Warning message is confusing when pod logs cannot be retrieved #3711

Warning message is confusing when pod logs cannot be retrieved #3711

Comments

jiezhang commented May 8, 2020 • edited Loading

What steps did you take:

What happened:

What did you expect to happen:

Bobgy commented May 15, 2020 • edited Loading

jonasdebeukelaer commented May 23, 2020

jonasdebeukelaer commented May 23, 2020

Bobgy commented May 25, 2020

jiezhang commented May 27, 2020

Bobgy commented Jul 31, 2020

k8s-ci-robot commented Jul 31, 2020

Bobgy commented Jul 31, 2020

jonasdebeukelaer commented Aug 26, 2020

Bobgy commented Aug 26, 2020

jonasdebeukelaer commented Sep 4, 2020

Ark-kun commented Oct 12, 2020

Bobgy commented Oct 12, 2020

haydnkeung commented Dec 18, 2020

ConverJens commented Jan 12, 2021

Bobgy commented Jan 14, 2021

ConverJens commented Jan 14, 2021

stale bot commented Jun 3, 2021

ra312 commented Jul 1, 2021 • edited Loading

ConverJens commented Jul 5, 2021

ra312 commented Jul 13, 2021

rohitgujral commented Oct 8, 2021

ConverJens commented Oct 8, 2021

stale bot commented Mar 2, 2022

ra312 commented Apr 1, 2022

ra312 commented Apr 1, 2022

google-oss-prow bot commented Apr 1, 2022

jiezhang commented May 8, 2020 •

edited

Loading

Bobgy commented May 15, 2020 •

edited

Loading

ra312 commented Jul 1, 2021 •

edited

Loading