Skip to content

Commit

Permalink
Update README to reflect Kubeflow migration to prow build cluster (ku…
Browse files Browse the repository at this point in the history
  • Loading branch information
clarketm authored Mar 31, 2020
1 parent 63b8191 commit e9afed1
Showing 1 changed file with 13 additions and 13 deletions.
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
- [Debugging Failed Tests](#debugging-failed-tests)
- [Logs and Cluster Access for Kubeflow CI](#logs-and-cluster-access-for-kubeflow-ci)
- [Access Control](#access-control)
- [No results show up in Gubernator](#no-results-show-up-in-gubernator)
- [No results show up in Spyglass](#no-results-show-up-in-spyglass)
- [No Logs in Argo UI For Step or Pod Id missing in Argo Logs](#no-logs-in-argo-ui-for-step-or-pod-id-missing-in-argo-logs)
- [Debugging Failed Deployments](#debugging-failed-deployments)
- [Testing Changes to the ProwJobs](#testing-changes-to-the-prowjobs)
Expand Down Expand Up @@ -69,7 +69,7 @@ Here's how it works
* The Argo workflow will use an NFS volume to attach a shared POSIX compliant filesystem to each step in the
workflow.
* Each step in the pipeline can write outputs and junit.xml files to a test directory in the volume
* A final step in the Argo pipeline will upload the outputs to GCS so they are available in gubernator
* A final step in the Argo pipeline will upload the outputs to GCS so they are available in spyglass

Quick Links

Expand Down Expand Up @@ -116,16 +116,16 @@ Logs from the E2E tests are available in a number of places and can be used to t

These should be publicly accessible.

The logs from each step are copied to GCS and made available through gubernator. The K8s-ci robot should post
a link to the gubernator UI in the PR. You can also find them as follows
The logs from each step are copied to GCS and made available through spyglass. The K8s-ci robot should post
a link to the spyglass UI in the PR. You can also find them as follows

1. Open up the prow jobs dashboard e.g. [for kubeflow/kubeflow](https://prow.k8s.io/?repo=kubeflow%2Fkubeflow)
1. Find your job
1. Click on the link under job; this goes to the Gubernator dashboard
1. Click on artifacts
1. Navigate to artifacts/logs

If these logs aren't available it could indicate a problem running the step that uploads the artifacts to GCS for gubernator. In this
If these logs aren't available it could indicate a problem running the step that uploads the artifacts to GCS for spyglass. In this
case you can use one of the alternative methods listed below.

### Argo UI
Expand Down Expand Up @@ -188,11 +188,11 @@ Our tests are split across three projects
* **k8s-prow-builds**

* This is owned by the prow team
* This is where the prow jobs run
* We are working on changing this see [kubeflow/testing#475](https://github.com/kubeflow/testing/issues/475)
* This is where the prow jobs are defined

* **kubeflow-ci**

* This is where the prow jobs run in the `test-pods` namespace
* This is where the Argo E2E workflows kicked off by the prow jobs run
* This is where other Kubeflow test infra (e.g. various cron jobs run)

Expand Down Expand Up @@ -240,22 +240,22 @@ We currently have the following levels of access

* Example maintainers are granted elevated access to these clusters in order to facilitate development of these tests

### No results show up in Gubernator
### No results show up in Spyglass

If no results show up in Gubernator this means the prow job didn't get far enough to upload any results/logs to GCS.
If no results show up in Spyglass this means the prow job didn't get far enough to upload any results/logs to GCS.

To debug this you need the pod logs. You can access the pod logs via the build log link for your job in the [prow jobs UI](https://prow.k8s.io/)

* Pod logs are ephmeral so you need to check shortly after your job runs.

The pod logs are available in StackDriver but only the Google Kubeflow Team has access
* Prow runs on a cluster owned by the K8s team not Kubeflow
* This policy is determined by K8s not Kubeflow
* This could potentially be fixed by using our own prow build cluster [issue#32](https://github.com/kubeflow/testing/issues/32)
* Prow controllers run on a cluster (`k8s-prow/prow`) owned by the K8s team
* Prow jobs (i.e. pods) run on a build cluster (`kubeflow-ci/kubeflow-testing`) owned by the Kubeflow team
* This policy for controller logs is owned by K8s, while the policy for job logs is governed by Kubeflow

To access the stackdriver logs

* Open stackdriver for project [k8s-prow-builds](https://console.cloud.google.com/logs/viewer?organizationId=433637338589&project=k8s-prow-builds&folder&minLogLevel=0&expandAll=false&timestamp=2018-05-22T17:09:26.625000000Z&customFacets&limitCustomFacetWidth=true&dateRangeStart=2018-05-22T11:09:27.032Z&dateRangeEnd=2018-05-22T17:09:27.032Z&interval=PT6H&resource=gce_firewall_rule&scrollTimestamp=2018-05-22T15:40:23.000000000Z&advancedFilter=resource.type%3D"container"%0Aresource.labels.pod_id%3D"15f5a424-5dd6-11e8-826c-0a580a6c0117"%0A)
* Open stackdriver for project [kubeflow-ci](https://console.cloud.google.com/logs/viewer?organizationId=433637338589&project=kubeflow-ci&minLogLevel=0&expandAll=false&customFacets=&limitCustomFacetWidth=true&interval=P7D&resource=k8s_container%2Fcluster_name%2Fkubeflow-testing%2Fnamespace_name%2Ftest-pods&advancedFilter=resource.type%3D%22k8s_container%22%0Aresource.labels.cluster_name%3D%22kubeflow-testing%22%0Aresource.labels.namespace_name%3D%22test-pods%22%0Aresource.labels.pod_name%3D%22bc2f6d5d-7035-11ea-bd6a-f29ce8b0e481%22%0A)
* Get the pod ID by clicking on the build log in the [prow jobs UI](https://prow.k8s.io/)
* Filter the logs using

Expand Down

0 comments on commit e9afed1

Please sign in to comment.