Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tekton Latest Pipeline Run does not always show the data. #735

Closed
christophe-f opened this issue Sep 14, 2023 · 6 comments
Closed

Tekton Latest Pipeline Run does not always show the data. #735

christophe-f opened this issue Sep 14, 2023 · 6 comments
Assignees

Comments

@christophe-f
Copy link
Contributor

Describe the bug
Bug on behalf of Natale.

The data is not always visible in the LPR plugin in the CI tab. Sometimes it will display 'No Pipeline Run to visualize'

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'Catalog' Page
  2. Click on a component that has Tekton enabled.
  3. Click ok the 'CI' tab
  4. Most of the time the data will be displayed, and sometimes not as seen in the screenshot below.

Expected behavior
The data should be visible every time.

Screenshots
If applicable, add screenshots to help explain your problem.

lpr1

lpr2

Additional context
Add any other context about the problem here.

@christoph-jerolimov
Copy link
Member

christoph-jerolimov commented Nov 16, 2023

I tried to reproduce this issue multiple times with the Pipeline Run list page.

Let us close this issue because the "Latest Pipeline Run" component isn't shown anymore above the Pipeline Run list, so that redesign might fix this.

@debsmita1
Copy link
Member

Reopening this issue as I am able to reproduce it

Screen.Recording.2023-11-23.at.12.08.13.AM.mov

@debsmita1 debsmita1 reopened this Nov 22, 2023
@invincibleJai
Copy link
Member

invincibleJai commented Nov 30, 2023

the error happens due to the below issue for the API call /api/kubernetes/services/{service_name}

{
    "items": [
        {
            "cluster": {
                "name": "ocp"
            },
            "podMetrics": [],
            "resources": [],
            "errors": [
                {
                    "errorType": "FETCH_ERROR",
                    "message": "request to https://XXXXX-XXXXXXXX.devcluster.openshift.com:6443/api/v1/pods?labelSelector=backstage.io%2Fkubernetes-id%3Dbackstage failed, reason: getaddrinfo ENOTFOUND api.akundu-301120231020.devcluster.openshift.com"
                }
            ]
        }
    ]
}
image

@gashcrumb
Copy link
Member

I haven't reproduced this yet locally, but if it's due to flaky DNS we may not have much option other than maybe doing x number of retries before giving up and reporting the failure.

Is this FETCH_ERROR type used for any kind of problem fetching data from the API server? It may be difficult then to discern what could be a temporary failure which would be suitable for retries vs immediately reported to the user.

@gashcrumb
Copy link
Member

gashcrumb commented Dec 4, 2023

So far the closest I've gotten to reproducing this is a 504 gateway timeout being reported, I was booting up the cluster and happened to have a browser pointed at the tekton page. However the error message cleared itself up while I was considering grabbing a screenshot, as the component continuously polls for updates. I did this testing with the 1.0 pre-release image, but if there's a more relevant setup to try like maybe running with a local instance separate from the cluster I can try that too.

If our components poll regularly then I think there's not much more we can do to cover transient issues between the backstage backend and the clusters it's configured to proxy requests to other than report problems right away like we are now doing. We could think about another layer of error reporting where maybe some cases don't need to be immediately reported to the user and the UI just gracefully handles it.

In any case, I think the error reporting as implemented is much better than what's shown in the original screenshots, where there's no mention of an actual error, just no pipelines which is confusing. A warning notification that there's been a problem querying the cluster is much more helpful.

@christoph-jerolimov
Copy link
Member

christoph-jerolimov commented Jan 30, 2024

As discussed with the UI team, we can and will improve the plugin in the future. However, we can not DNS or network issues in the plugin. We show errors now and improved the rendering with #1140 a bit.

I'm closing this now, and we think we should open a new one when this or a similar error occurs again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants