Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support advanced PV like NFS via VolumeOp #3313

Closed
rmgogogo opened this issue Mar 19, 2020 · 8 comments
Closed

Support advanced PV like NFS via VolumeOp #3313

rmgogogo opened this issue Mar 19, 2020 · 8 comments
Assignees
Labels
area/sdk kind/bug kind/feature lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@rmgogogo
Copy link
Contributor

rmgogogo commented Mar 19, 2020

What happened:

VolumeOp only support default PV. If user creates other PV, e.x. based on NFC, it can't be used by KFP via PVC in VolumeOp.

What did you expect to happen:

Environment:

How did you deploy Kubeflow Pipelines (KFP)?

KFP version:

KFP SDK version:

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind bug

@elikatsis
Copy link
Member

My 5 cents:

It's not dynamic if you specify a PV name. The pipeline won't be able to even run twice (if the user doesn't use a parameter for it).
When speaking of storage on kubernetes, the word dynamic most probably refers to dynamic provisioning. That is, cutting a PV based on PVC requests when the PVC gets created.

NFS provisioning enables exactly this dynamic behavior, and that's what VolumeOp was initially implemented for and already serves. But NFS on a kubernetes cluster should refer to a storage class, not a specific PV (or set of PVs?).
So VolumeOp already supports specifying a storage class (like some NFS-based).

The PR you opened enables a different functionality: The user knows about the existence of a PV which was hydrated with data.
But how was it hydrated?
In most cases, some other PVC was used to fill the PV with data, so why not use that PVC? No two PVCs can bind to one PV and when a PVC is deleted its bound PV gets deleted, unless some advanced storage class configuration is set.
In other, I think too few cases, the admin could somehow cut PVs on filesystem paths (do you know any way this can happen? I can't think of one at the moment). Then that makes the system static because it's bound to admin's (human) actions.

Could you elaborate some more on your thoughts, so we are on the same page? I'm all up for getting #3314 merged, it's a functionality that it's not bad to be there, but I think we're missing something.

@Ark-kun
Copy link
Contributor

Ark-kun commented Mar 20, 2020

VolumeOp only support default PV. If user creates other PV, e.x. based on NFC, it can't be used by KFP via dynamic PVC in VolumeOp.

BTW, Do you know that KFP supports mounting any Kubernetes' volumes (including NFS) using container_op.add_volume and container_op.add_volume_mount?

@rmgogogo
Copy link
Contributor Author

rmgogogo commented Mar 23, 2020

I'm not sure whether we are on same page. Here list more background.

  1. Goal is to support NFS
  2. Per GCP guide, NFS can be consumed via PV. (I don't see storage class description, let me know if you find some doc on it).
  3. So then it becomes how to create PVC based on the PV backed with NFS.
  4. We have two options.
    4.1. Use PipelineVolume. It works via using existing PVC (prebuilt manually). It's already verified and it can work. (I paste the sample in the bottom.)
    4.2. Use container_op.add_volume and container_op.add_volume_mount. It can mount existing PV. A sample is in samples/contrib/nvidia-resnet/pipeline/src/pipeline.py
    4.3. Use VolumeOp to create a PVC based on existing PV. This issue and the PR is for this 4.3.

Sample for 4.1:

  1. NFS
gcloud filestore instances create nfs-server \
    --project=renming-mlpipeline \
    --zone=us-west1-b \
    --tier=STANDARD \
    --file-share=name="vol1",capacity=1TB \
    --network=name="default",reserved-ip-range="10.0.0.0/29"
  1. PV on NFS
    kubectl create -f pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: fileserver
spec:
  capacity:
    storage: 1T
  accessModes:
  - ReadWriteMany
  nfs:
    path: /vol1
    server: 10.0.0.2
  1. PVC on PV
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: fileserver-claim
spec:
 accessModes:
 - ReadWriteMany
 storageClassName: ""
 volumeName: "fileserver"
 resources:
   requests:
     storage: 10G
  1. Use existing PVC
import kfp
import kfp.dsl as dsl

@dsl.pipeline(
    name="VolumeOp Basic",
    description="A Basic Example on VolumeOp Usage."
)
def volumeop_basic():    
    cop = dsl.ContainerOp(
        name="cop",
        image="library/bash:4.4.23",
        command=["sh", "-c"],
        arguments=["echo foo > /mnt/file1"],
        pvolumes={"/mnt": dsl.PipelineVolume(pvc="fileserver-claim")}
    )

@rmgogogo
Copy link
Contributor Author

@elikatsis I removed the term "dynamic". This issue is not for dynamic provision NFS.

To dynamic provision NFS, it requires install a provisioner (https://cloud.google.com/community/tutorials/gke-filestore-dynamic-provisioning). This issue is not for dynamically provision PV.

@stale
Copy link

stale bot commented Jun 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 24, 2020
@rmgogogo
Copy link
Contributor Author

Please check whether the upper reply is OK. I think we can close this ticket.

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 24, 2020
@stale
Copy link

stale bot commented Sep 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Sep 22, 2020
@stale
Copy link

stale bot commented Sep 30, 2020

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sdk kind/bug kind/feature lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
None yet
Development

No branches or pull requests

4 participants