Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/log unavailable warning #3848

Merged

Conversation

jonasdebeukelaer
Copy link
Contributor

@jonasdebeukelaer jonasdebeukelaer commented May 26, 2020

  • only allow troubleshoot link on error banner
  • edit msg slightly when logs have gone away from cluster

Fixes #3711

@k8s-ci-robot k8s-ci-robot requested review from Bobgy and neuromage May 26, 2020 11:53
@kubeflow-bot
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @jonasdebeukelaer. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jonasdebeukelaer
Copy link
Contributor Author

jonasdebeukelaer commented May 26, 2020

feels like this failing build has nothing to do with ui? 🤔

broken build shows

============================== 71 passed in 1.21s ==============================
___________________________________ summary ____________________________________
  py35: commands succeeded
  congratulations :)
The command "./run_test.sh" exited with 0.
0.00s$ cd $TRAVIS_BUILD_DIR/test/sample-test/unittests
The command "cd $TRAVIS_BUILD_DIR/test/sample-test/unittests" exited with 0.
0.21s$ python3 -m unittest utils_tests.py
.
----------------------------------------------------------------------
Ran 1 test in 0.006s
OK
The command "python3 -m unittest utils_tests.py" exited with 0.

@jonasdebeukelaer jonasdebeukelaer force-pushed the fix/log-unavailable-warning branch from 8f75ccc to 88c8729 Compare May 26, 2020 12:15
@Bobgy
Copy link
Contributor

Bobgy commented May 27, 2020

@jonasdebeukelaer Thanks a lot for your contribution! This is absolutely awesome for UX.
/lgtm
/approve
/ok-to-test

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Bobgy
Copy link
Contributor

Bobgy commented May 27, 2020

Yes, the travis unit tests are flaky. I'll help you retry them.

@Bobgy
Copy link
Contributor

Bobgy commented May 27, 2020

/retest

@Bobgy
Copy link
Contributor

Bobgy commented May 27, 2020

@jonasdebeukelaer The error message for frontend unit test: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_pipelines/3848/kubeflow-pipeline-frontend-test/1265449238280540160#1:build-log.txt%3A60

@Ark-kun
Copy link
Contributor

Ark-kun commented May 27, 2020

/test kubeflow-pipeline-frontend-test

@jonasdebeukelaer
Copy link
Contributor Author

changes:

  • added info type banner
  • use it if it's a case of pod not found by doing a get pod if errors, and seeing if it's a 404
  • updated other banners in the side panel to use info type banner where appropriate
  • added some missing unit tests

node: I've made namespace required for Apis.getPodLogs, not sure why it was optional before?

@jonasdebeukelaer
Copy link
Contributor Author

/retest

Copy link
Contributor

@Bobgy Bobgy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I've left some minor comments and nitpickings.

logsBannerMessage = 'Failed to retrieve pod logs.';

// if pod can't be found, then assume it has gone from the server
if (await this.podNotFound(selectedNodeDetails.id, namespace)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just confirm the error message returned from Apis.getPodLogs has no hint whether the pod is not found?
If no, this seems a reasonable workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does allude to workflow not being found, or something along those lines, but the error code is 500, so would have to do a msg contains text type check. Feels a bit brittle but happy to do that instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with text check in this PR, if you feel you have capacity, we can also detect and change the error code in

stream.on('error', err => res.status(500).send('Could not get main container logs: ' + err));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh I thought this was interfacing straight with argo 🤦 . This makes sense then yeah , will do here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I've updated this but don't have time to stand up the whole backend to test it. is there some kind of short cut way to do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a running KFP cluster?
If yes, you can follow https://github.com/kubeflow/pipelines/blob/master/frontend/README.md#proxy-to-a-real-cluster, you will need to develop "Client UI + Node server". It will send api requests to the cluster for anything else.

Copy link
Contributor

@Bobgy Bobgy Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, we haven't added integration tests for pod logs api method.

// TODO: Add integration tests for k8s helper related endpoints
// describe('/k8s/pod/logs', () => {});

You can also use integration test as a way for testing it too, I think there are plenty of example test cases next to it. Feel free to ask me if you need any help.

sidepanelBannerMode = 'error';
break;
case NodePhase.FAILED:
sidepanelBannerMode = 'warning';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't have much context here, can you help me understand the difference between ERROR and FAILED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest I'm not 100% of this either, but I would understand this to be

  • ERROR if there was a error trying to set up the container or something, or network error, or issue in argo, and
  • FAILED means the execution of the container failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I'm not sure I understand why you want to show warning for failed node.
For users, it seems to me they should also perceive it as an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to try to separate pipelines failure from system failure, but happy to change to error type too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. IMO, can we add some wording in the error message to indicate difference between fail and error.
e.g. error -> Pipelines System Error: original message; while fail -> Failed execution: ...

what do you think?

This might be a little controversial, we can also revert this part keeping current behavior, so that we can get this PR merged. Opening a new PR for this specific issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I'll make em all Error type, and we can add a separate issue to add those prefixes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Thank you!

@jonasdebeukelaer
Copy link
Contributor Author

Awesome! I've left some minor comments and nitpickings.

@Bobgy oops totally missed your update! Should have some time to fix this evening 👍

@jonasdebeukelaer jonasdebeukelaer force-pushed the fix/log-unavailable-warning branch from 5bbe7b5 to 47a372e Compare June 9, 2020 19:15
@jonasdebeukelaer
Copy link
Contributor Author

/retest

@jonasdebeukelaer
Copy link
Contributor Author

/retest

Copy link
Contributor

@Bobgy Bobgy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasdebeukelaer Great work!
Left some comments

sidepanelBannerMode = 'error';
break;
case NodePhase.FAILED:
sidepanelBannerMode = 'warning';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I'm not sure I understand why you want to show warning for failed node.
For users, it seems to me they should also perceive it as an error.

@jonasdebeukelaer jonasdebeukelaer force-pushed the fix/log-unavailable-warning branch from 47a372e to c3270dd Compare June 12, 2020 17:08
sidepanelBannerMode = 'error';
break;
case NodePhase.FAILED:
sidepanelBannerMode = 'warning';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. IMO, can we add some wording in the error message to indicate difference between fail and error.
e.g. error -> Pipelines System Error: original message; while fail -> Failed execution: ...

what do you think?

This might be a little controversial, we can also revert this part keeping current behavior, so that we can get this PR merged. Opening a new PR for this specific issue.

}}
/>,
);
expect(tree.findWhere(el => el.text() === 'Refresh')).toEqual({});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't quite understand, why toEqual({})?
Let me take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified, toEqual({}) will succeed no matter nodes found or not, because the wrapper helper object has no direct properties all the time.

Please use this way to verify not found:

expect(
tree
.findWhere(
el =>
el.text() === `Tensorboard is starting, and you may need to wait for a few minutes.`,
)
.exists(),
).toEqual(false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh thanks for that, sorry I'm quite new to ts tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, you are doing great

@jonasdebeukelaer jonasdebeukelaer force-pushed the fix/log-unavailable-warning branch from c3270dd to f9627d8 Compare June 16, 2020 12:54
Copy link
Contributor

@Bobgy Bobgy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks a lot for your continued efforts!

I'm okay with no additional test for pod logs for now, because the change is small and easily reviewable.
Would you mind updating the last minor issue?

/lgtm

err?.message &&
err.message?.indexOf('Unable to find pod log archive information') > -1
) {
res.status(404).send('pod not found');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: also append err to make sure we can debug if there are false-positives of 404s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh, I forgot I approved before.
Feel free to update it in a following up PR

@k8s-ci-robot k8s-ci-robot merged commit 040615c into kubeflow:master Jun 17, 2020
RedbackThomson pushed a commit to RedbackThomson/pipelines that referenced this pull request Jun 17, 2020
* [UI] only allow troubleshoot link on error banner

* [UI] improve use of banners in run view sidepanel

* [UI] add info type banner
@Bobgy Bobgy added the cherrypick-approved area OWNER approves to cherry pick this PR to current active release branch label Jun 19, 2020
@Bobgy Bobgy added the cherrypicked cherry picked to release branch `release-x.y` label Jul 2, 2020
Bobgy pushed a commit that referenced this pull request Jul 2, 2020
* [UI] only allow troubleshoot link on error banner

* [UI] improve use of banners in run view sidepanel

* [UI] add info type banner
Jeffwan pushed a commit to Jeffwan/pipelines that referenced this pull request Dec 9, 2020
* [UI] only allow troubleshoot link on error banner

* [UI] improve use of banners in run view sidepanel

* [UI] add info type banner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved cherrypick-approved area OWNER approves to cherry pick this PR to current active release branch cherrypicked cherry picked to release branch `release-x.y` lgtm ok-to-test size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warning message is confusing when pod logs cannot be retrieved
6 participants