-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VolumeOp was not able to create PVC #5257
Comments
/assign @elikatsis |
Kubeflow cache's steps, so given the same inputs, it skips the step and instead fetches outputs from minio. In this case you would need to change the name of the PVC. You can also refer to #5055 (comment). |
Does the following work ? vop = kfp.dsl.VolumeOp(
name="volume_creation",
resource_name=f"{{{{workflow.name}}}}-sharedpvc",
size="5Gi",
modes=["RWO"]
) This will hopefully change the input to the ResourceOp and hence prevent caching. Update: This workound doesn't seem to work with Kubeflow 1.3/latest kfp version. I think this workaround worked with Kubeflow 1.2. |
Yes, the problem is fixed after I disabled the cache. Thanks for the help. |
@Bobgy sorry I had totally missed this. I think the problem here is that
|
I'll reopen this issue as it needs some fix apart from globally disabling the cache /reopen |
@elikatsis: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Can we rename this issue? The typo makes it hard to find. |
Hi, for some reason, this workaround does not work for me. KF version: 1.6.0, Kubeflow SDK 1.6.3. I used the same pipeline from the issue description. My pcv was created only once. Thank you |
Facing the same issue. Tried to change |
I think KFP v1 caching should not cache volume op / resource op, because the side effect is intended. Welcome contributions to fix this. Besides that, caching for KFP v2 compatible mode should no longer cache volume ops. You can also consider trying it out too when it's released and documented for your env. (currently documented for KFP standalone, but not full Kubeflow.) |
Updated workaround for getting around VolumeOp Caching. This seems to work with kfp version 1.7.2. REF: #4857 (comment) test_vop = kfp.dsl.VolumeOp(
name="volume",
resource_name="pvc-name",
modes=['RWO'],
storage_class="standard",
size="10Gi"
).add_pod_annotation(name="pipelines.kubeflow.org/max_cache_staleness", value="P0D") |
Not sure this is the correct place to bring this up. And I am not familiar with v2 component configuration yet but it looks like the mutation webhook in cache-server backend is looking for key pipelines.kubeflow.org/enable_caching while the python sdk is creating key pipelines.kubeflow.org/cache_enabled when using set_cache_enabled. See following files:
Although it was mentioned to use .add_pod_annotation(name="pipelines.kubeflow.org/max_cache_staleness", value="P0D") which looks like it should work from the code in mutation.go. If that's the case will the base_op function set_caching_enabled be deprecated in the future. The change on cache-server to allow the function of set_caching_enabled to work will be easy. I made the changes and tested them on my forked version of this repo. I can make a PR if the set_caching_enabled is not going to be deprecated. |
Thank you very much, i was experiencing the same issue |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
What is the solution for this issue in version 1.8.0 of kubeflow while using 1.8.22 version of kfp sdk? What I want is to be able to run multiple instance of a pipeline (these are model training jobs) and want to make sure persistent volumes are not shared between these jobs. I have tried the following so far =>
I have also tried to disable cache globally by following |
@shashisingh the solution is in this thread and confirmed by others /close |
@juliusvonkohout: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What steps did you take:
A simple pipeline with one vol_op and one simple step that mounts the pvc created.
What happened:
The VolumeOp was not able to create the PVC, therefore the depending task complains about not finding the PVC.
kubectl get pvc -n kubeflow | grep sharedpvc
didn't return any results.What did you expect to happen:
The VolumeOp shall create a PVC named
sharedpvc
.Environment:
How did you deploy Kubeflow Pipelines (KFP)?
Deploying Kubeflow Pipelines on a local kind cluster
KFP version: 1.2.0
KFP SDK version: 1.4.0
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
The log of the VolumneOp indicates
I was trying to prevent it from using the cache but didn't succeed.
The manifest from the VolumneOp
The log of the depending task indicates
Storage class used
/kind bug
The text was updated successfully, but these errors were encountered: