Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create volumes or pods under a specific namespace when creating pipeline with create_run_from_pipeline_package #4746

Closed
supertetelman opened this issue Nov 10, 2020 · 11 comments

Comments

@supertetelman
Copy link

supertetelman commented Nov 10, 2020

What steps did you take:

I am trying to create volumes and Pods under a specific namespace using create_run_from_pipeline_package(namespace=...). However everything is being created under kubeflow.

What happened:

I am running the below code snippet in a Jupyter Notebook running under the anonymous namespace. When I submit the pipeline the PVCs and Pods are all started in the kubeflow namespace. As far as I can tell passing namespace to the create_run_from_pipeline_package() call should restrict all resource creation to that namespace.

This is causing me problems because I am trying to get the logs from some subsequent pods using kubectl and kubectl can only access pods in the anonymous namespace.

Am I doing something wrong with my call or is this an issue of some sort?

What did you expect to happen:

I expected the pvcs and all the Pods in the pipeline to be created in the anonymous namespace.

Here is the python snippet that reproduces this:

pipeline_namespace = "anonymous"

import yaml
import uuid
import kfp
import kfp.dsl as dsl


@dsl.pipeline(
    name="Multinode Data POC",
    description="A quick hello-world example"
)
def resourceop_multinode():
    # Use the Tensorflow Docker Image from NGC https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow   
    __IMAGE_VERSION__ = 'nvcr.io/nvidia/tensorflow:20.10-tf1-py3'

    # Download the NVIDIA DeepLearningExamples Git repo into a PersistentVolume
    code_volume_op = dsl.VolumeOp(
        name="code_volume_creation",
        resource_name="transformer_xl_code",
        size="5Gi",
        modes=kfp.dsl.VOLUME_MODE_RWM
    )
    code_download_op = dsl.ContainerOp(
        name="code_download",
        image=__IMAGE_VERSION__,
        command=["/bin/bash", "-cx"],
        arguments=["cd /mnt; git clone https://github.com/NVIDIA/DeepLearningExamples"],
        pvolumes={"/mnt": code_volume_op.volume}
    )

kfp.compiler.Compiler().compile(resourceop_multinode, 'multinode-data-example.yaml')
run_result = kfp.Client(host=None).create_run_from_pipeline_package('multinode-data-example.yaml', arguments={}, namespace=pipeline_namespace)

kfp.version == 1.1.0
Kubeflow is kftcl 1.1.0 deployed with the master istio yaml file.

/kind bug
/area backend
/area sdk

@chensun
Copy link
Member

chensun commented Nov 11, 2020

The SDK doesn't create namespace by any means. The namespace argument passed into create_run_from_pipeline_package(namespace=...) needs to be pre-configured by cluster admin, and the user running this client code needs to be explicitly granted access to that namespace. Is that the case for your anonymous namespace?

Here's the user guide for multi-user isolation: https://www.kubeflow.org/docs/pipelines/multi-user/

/cc @Bobgy

@supertetelman
Copy link
Author

I understand that.

The namespace exists and the Notebook I am executing these commands from was launched into that namespace via Kubeflow Notebooks. I can manually create these resources from the notebook using Kubeflow.

The problem is I cannot get pipelines to use anything but the kubeflow namespace.

@chensun
Copy link
Member

chensun commented Nov 13, 2020

@supertetelman I tried your steps today but couldn't repro.
I noticed that your code doesn't have the full signature required for multi-user usage. We expect client_id, other_client_id, and other_client_secret to be passed into Client ctor, for example:

import kfp

run_result = kfp.Client(
    host='https://<KF_NAME>.endpoints.<PROJECT>.cloud.goog/pipeline', 
    client_id='<AAAAAAAAAAAAAAAAAAAAAA>.apps.googleusercontent.com', 
    other_client_id='<BBBBBBBBBBBBBBBBBBB>.apps.googleusercontent.com', 
    other_client_secret='<CCCCCCCCCCCCCCCCCCCC>'
).create_run_from_pipeline_package('multinode-data-example.yaml', arguments={}, namespace=pipeline_namespace)

Did you pass those argument in your real code?

Besides, I wonder if you deployment has multi-user mode enabled. Can you please check the value in config map.

  1. Run kubectl get configmap -n kubeflow
  2. Find the name that starts with pipeline-api-server-config-
  3. Describe that config by running kubectl describe configmap pipeline-api-server-config-xxxxxx -n kubeflow
  4. See if the returned value contains
MULTIUSER:
----
true

@Bobgy
Copy link
Contributor

Bobgy commented Nov 13, 2020

@chensun one clarification the client id settings are GCP specific.

Agree with everything else!

@supertetelman
Copy link
Author

supertetelman commented Nov 14, 2020

  1. kubectl get configmap -n kubeflow

No, I did not include those parameters. The full notebook can be found here, but the code snippet I provided is reproducing things for me (https://github.com/supertetelman/deepops/blob/kubeflow-mpi/workloads/examples/k8s/kubeflow-mpi/multinode-pipeline.ipynb).

I don't see a configmap for pipeline-api-server, but based on the manifest's kustomize I think this is ml-pipeline-config.

$ kubectl get configmap -n kubeflow
NAME                                             DATA   AGE
a33bd623.machinelearning.seldon.io               0      114s
admission-webhook-admission-webhook-parameters   1      2m10s
admission-webhook-bootstrap-config-map           3      2m10s
application-controller-parameters                1      2m59s
inferenceservice-config                          7      2m8s
istio-parameters-t6hhgfg9k2                      2      2m59s
jupyter-web-app-jupyter-web-app-config           1      2m10s
jupyter-web-app-parameters                       7      2m10s
katib-config                                     3      2m7s
katib-parameters                                 1      2m7s
kfserving-config                                 1      2m8s
metadata-db-parameters                           3      2m9s
metadata-grpc-configmap                          2      2m9s
metadata-ui-parameters                           1      2m9s
ml-pipeline-config                               1      2m7s
notebook-controller-parameters                   3      2m9s
parameters                                       4      2m10s
pipeline-minio-parameters                        1      2m7s
pipeline-mysql-parameters                        1      2m7s
profiles-profiles-parameters-5c86m8kfb8          4      2m6s
seldon-config                                    4      2m6s
spartakus-config                                 1      2m8s
trial-template                                   3      2m7s
ui-parameters-hb792fcf5d                         1      2m7s
workflow-controller-configmap                    1      2m10s
workflow-controller-parameters                   12     2m10s
$ kubectl -n kubeflow describe configmap ml-pipeline-config
Name:         ml-pipeline-config
Namespace:    kubeflow
Labels:       app=ml-pipeline
              app.kubernetes.io/component=api-service
              app.kubernetes.io/name=api-service
Annotations:
Data
====
config.json:
----
{
  "DBConfig": {
    "DriverName": "mysql",
    "DataSourceName": "",
    "DBName": "mlpipeline",
    "GroupConcatMaxLen": "4194304"
  },
  "ObjectStoreConfig":{
    "AccessKey": "minio",
    "SecretAccessKey": "minio123",
    "BucketName": "mlpipeline",
    "Secure": false
  },
  "InitConnectionTimeout": "6m",
  "DefaultPipelineRunnerServiceAccount": "pipeline-runner",
  "ML_PIPELINE_VISUALIZATIONSERVER_SERVICE_HOST": "ml-pipeline-ml-pipeline-visualizationserver",
  "ML_PIPELINE_VISUALIZATIONSERVER_SERVICE_PORT": 8888
}

Events:  <none>

$ kubectl -n kubeflow describe configmap pipeline-mysql-parameters
Name:         pipeline-mysql-parameters
Namespace:    kubeflow
Labels:       app=mysql
              app.kubernetes.io/component=mysql
              app.kubernetes.io/name=mysql
Annotations:
Data
====
mysqlPvcName:
----
mysql-pv-claim
Events:  <none>

$ kubectl -n kubeflow describe configmap pipeline-minio-parameters
Name:         pipeline-minio-parameters
Namespace:    kubeflow
Labels:       app=minio
              app.kubernetes.io/component=minio
              app.kubernetes.io/name=minio
Annotations:
Data
====
minioPvcName:
----
minio-pv-claim
Events:  <none>

image

This is all being run on-prem and I deployed with manifest: https://raw.githubusercontent.com/kubeflow/manifests/master/kfdef/kfctl_k8s_istio.yaml
And kfctl: https://github.com/kubeflow/kfctl/releases/download/v1.1.0/kfctl_v1.1.0-0-g9a3621e_linux.tar.gz

@chensun
Copy link
Member

chensun commented Nov 14, 2020

@chensun one clarification the client id settings are GCP specific.

Thanks, @Bobgy. I think we never tested multi-user without Cloud IAP then. It probably won't work by design.

@supertetelman
I think your deployment is in single user mode. That would explain why you were able to submit a run without client_id etc. Otherwise, I'd imagine it would fail immediately if you were in multi-user mode.

This is all being run on-prem and I deployed with manifest: https://raw.githubusercontent.com/kubeflow/manifests/master/kfdef/kfctl_k8s_istio.yaml

This file references manifest from master: https://github.com/kubeflow/manifests/archive/master.tar.gz
I checked there's no multi-user setting in https://github.com/kubeflow/manifests/blob/master/pipeline/api-service/base/config-map.yaml (or its overlay). Without this setting, service backend defaults to single-user mode.

@Bobgy looks like we never merged the multi-user manifest to master, right? Do you think it's still worth trying multiuser without Cloud IAP setup? Not sure what could be the counterpart for on-prem.

@supertetelman
Copy link
Author

supertetelman commented Nov 14, 2020

Ahh, well I tried updating my configmap with "MULTIUSER": "true", and I'm now getting the below error. Somewhat looks like progress, it appears to be an auth issue now.

Reason: Conflict
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Sat, 14 Nov 2020 01:25:38 GMT', 'x-envoy-upstream-service-time': '2', 'server': 'envoy', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Failed to authorize the requests.: BadRequestError: Namespace required in Kubeflow deployment for authorization.: Namespace required in Kubeflow deployment for authorization.","message":"Failed to authorize the requests.: BadRequestError: Namespace required in Kubeflow deployment for authorization.: Namespace required in Kubeflow deployment for authorization.","code":10,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Namespace required in Kubeflow deployment for authorization.","error_details":"Failed to authorize the requests.: BadRequestError: Namespace required in Kubeflow deployment for authorization.: Namespace required in Kubeflow deployment for authorization."}]}

@chensun
Copy link
Member

chensun commented Nov 14, 2020

This error makes sense now. I think it's sort of expected given there's no Cloud IAP configuration.

Our currently implementation of multi-user probably doesn't support on-prem. @Bobgy WDYT?

@chensun
Copy link
Member

chensun commented Nov 14, 2020

@supertetelman Here's our doc on multi-user instructions for GCP. If you're interested in seeing if you can make it work for on-prem.

(You'll need to join to https://groups.google.com/g/kubeflow-discuss group to get access to the doc.)

@supertetelman
Copy link
Author

Okay thanks. I had looked at this previously and it seemed like more work than I wanted. But it seems with the v1.2 release, the Multiuser mode is built in and the ability I previously had to launch pipelines from a notebook now require auth that was previously unnecessary.

I will take a look at this and report back.

@supertetelman
Copy link
Author

I was never able to get past this problem. It looks like my issue is now being addressed by #5138.

I'm closing this issue for now and will track the new featureset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants