Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[frontend] logs and artifacts stored in different Minio locations and can't be accessed by frontent #6428

Closed
jonasdebeukelaer opened this issue Aug 25, 2021 · 15 comments

Comments

@jonasdebeukelaer
Copy link
Contributor

jonasdebeukelaer commented Aug 25, 2021

Environment

  • How did you deploy Kubeflow Pipelines (KFP)? from repo
  • KFP version: 1.7.0
    To find the version number, See version number shows on bottom of KFP UI left sidenav. -->
  • KFP SDK version:
kfp                                   1.7.0
kfp-pipeline-spec                     0.1.9
kfp-server-api                        1.7.0

Steps to reproduce

  • run a pipeline and store an artifact
  • delete the pod

Then we see that:

  • The artifact (and its preview) can be seen in the output list of Input/Output tab
  • The metrics can be accessed by the visualisations tab
  • A wrong link to the metrics is found in the outputs list of the input/Outputs tab
    • details read Failed to get object in bucket mlpipeline at path v2/artifacts/pipeline/example pipeline/e154ea68-d59c-46ba-aea4-ead5b9eb0ea8/square/metrics: S3Error: The specified key does not exist.
  • The logs cannot be retrieved from minio in the logs tab
    • details read Error response: Could not get main container logs: Error: Unable to find pod log archive information from workflow status.

Looking in MiniO it seems

  • the Artifact is stored at
    mlpipeline/v2/artifacts/pipeline/example%20pipeline/e154ea68-d59c-46ba-aea4-ead5b9eb0ea8/square/

  • and the logs and metrics are stored at
    mlpipeline/artifacts/example-pipeline-ljzls/2021/08/25/example-pipeline-ljzls-1034021739/

minimal code sample:

import kfp
from kfp.v2.dsl import pipeline, component, Output, Metrics, Artifact


@component(base_image='python:3')
def square(x: int, metrics: Output[Metrics], data: Output[Artifact]):
    metrics.log_metric('test_metrics', 1)
    metrics.log_metric(f'{x} squared', x**2)
    print(f'{x} squared is {x**2}')
    
    with open(data.path, 'w') as f:
        f.write("some content")


@pipeline(name='example pipeline')
def pipeline():
    square(4)

client = kfp.Client(host='http://localhost:8081')
client.create_run_from_pipeline_func(
    pipeline, arguments={}, run_name='Example run',
    experiment_name='Default', enable_caching=False,
    mode=kfp.dsl.PipelineExecutionMode.V2_COMPATIBLE
)

Expected result

The logs and metrics can be retrieved from MiniO by the frontend even when pods are deleted

Materials and Reference

Is this a configuration issue? Should I move things to point only to v2 path?


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@Bobgy
Copy link
Contributor

Bobgy commented Aug 27, 2021

might be related to #3818
/assign @zijianjoy

@jonasdebeukelaer
Copy link
Contributor Author

jonasdebeukelaer commented Aug 31, 2021

Not sure if related to that issue @Bobgy but looking in workflow-controller-configmap-patch.yaml
the keyFormat is completely different to where the UI looks for logs/metrics:

keyFormat: "artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}"

I tried changing this to

keyFormat: "v2/artifacts/pipeline/{{workflow.annotations.pipelines.kubeflow.org/run_name}}/{{workflow.labels.pipeline/runid}}"

but the object path wrong as UI looks for e.g. {{component display name}}/metrics whereas this saves as {{component display name}}-metrics.tgz

@andrijaperovic
Copy link

andrijaperovic commented Sep 15, 2021

@jonasdebeukelaer any workaround for this particular issue?
This is what I am seeing in the ml-pipeline-ui pod logs after pod deletion:

GET /k8s/pod/logs?podname=podname&runid=runid&podnamespace=kubeflow
Getting logs for pod:podname from mlpipeline/artifacts/workflow/podname/main.log.

which produces:
Could not get main container logs: S3Error: The specified key does not exist.

I tried changing the key format in the workflow-controller-configmap-patch.yaml to match the key which ml-pipeline-ui is expecting by removing the timestamp fields but did not seem to fix the issue:

From:
keyFormat: "artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}"

To:
keyFormat: "artifacts/{{workflow.name}}/{{pod.name}}"

@jonasdebeukelaer
Copy link
Contributor Author

@andrijaperovic For now we 're unfortunately having to access the artefacts through the MinIO interface by port-forwarding

kubectl port-forward svc/minio-service 8082:9000 -n kubeflow

@jonasdebeukelaer
Copy link
Contributor Author

This is quite an annoying bug for us at the moment.

I'd be happy to try fix this issue if it still need picking up. If someone can point to where to look that'd be ace

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 2, 2022
@zijianjoy zijianjoy removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 11, 2022
@capoolebugchat
Copy link

It has been one year, did anyone found out how to resolve/workaround this bug that'd be great, Im sure this is one of the major bugs.

@ghaering
Copy link

It has been one year, did anyone found out how to resolve/workaround this bug that'd be great, Im sure this is one of the major bugs.

Yes, it is. It is very annoying.

@pdettori
Copy link

Having the same issue with Kubeflow Pipelines 0.2.5 - if pods created by workflow are deleted, logs cannot be accessed from MinIO artifact store, even if they are archived there. Has anyone actually found a way to make this work ?

@pdettori
Copy link

I think I found the root cause of the problem. It is likely that the schema for Argo Workflow status has changed, as the Argo Workflow fields that are retrieved by

export async function getPodLogsMinioRequestConfigfromWorkflow(
are not anymore where the ts code expects it. The code expects a workflow status with S3 info under the node's output artifacts, but Argo Workflow may have changed since when that code was written and those properties are now under:

status:
  artifactRepositoryRef:
    artifactRepository:
      archiveLogs: true
      s3:

@AndersBennedsgaard
Copy link

According to #8935 (comment), #10568 should have fixed it. Do you still have issues?

@pdettori
Copy link

I still have the issue with KFP 2.2.0, which includes #10568 - does not look like the UI was updated to retrieve the S3 log artifacts stored by Argo. I have a fix in my fork that shows what is needed for the UI to work with the archived logs - master...pdettori:pipelines:s3-pod-logs-fix

@thesuperzapper
Copy link
Member

Since this issue is about V2_COMPATIBLE mode I am going to close it as that mode is no longer supported.

However, as @pdettori (and the other recent comments have raised) there is a similar issue present in V2 mode because we don't capture the main.log as an output artifact anymore.

Let's continue the discussion on the other issue:

/close

Copy link

@thesuperzapper: Closing this issue.

In response to this:

Since this issue is about V2_COMPATIBLE mode I am going to close it as that mode is no longer supported.

However, as @pdettori (and the other recent comments have raised) there is a similar issue present in V2 mode because we don't capture the main.log as an output artifact anymore.

Let's continue the discussion on the other issue:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@thesuperzapper
Copy link
Member

@pdettori I see in #6428 (comment) that you propose a solution to the V2 issue.

Can you please continue this discussion on the V2 issue: #10036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Closed
Development

No branches or pull requests

9 participants