Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi User] failed to call 'kfp.get_run' in in-cluster juypter notebook #5123

Closed
anneum opened this issue Feb 10, 2021 · 2 comments
Closed
Labels
kind/bug kind/misc types beside feature and bug

Comments

@anneum
Copy link

anneum commented Feb 10, 2021

What steps did you take:

I have a notebook server in a multiuser environment with kale.
After fixing several bugs based on community comments (see below) I run into a new issue.

Added ServiceRoleBinding and EnvoyFilter as mentioned in #4440 (comment)

export NAMESPACE=mynamespace
export NOTEBOOK=mynotebook
export [email protected]
 
cat >  ./envoy_filter.yaml << EOM
apiVersion: rbac.istio.io/v1alpha1
kind: ServiceRoleBinding
metadata:
  name: bind-ml-pipeline-nb-${NAMESPACE}
  namespace: kubeflow
spec:
  roleRef:
    kind: ServiceRole
    name: ml-pipeline-services
  subjects:
  - properties:
      source.principal: cluster.local/ns/${NAMESPACE}/sa/default-editor
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: ${NAMESPACE}
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: ${USER}
  workloadSelector:
    labels:
      notebook-name: ${NOTEBOOK}
EOM

Added the namespace to .config/kfp/context.json as metioned in kubeflow-kale/kale#210 (comment)

Added RoleBinding as mentioned in kubeflow-kale/kale#210 (comment)

cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: allow-workflow-nb-mynamespace
  namespace: mynamespace
subjects:
- kind: ServiceAccount
  name: default-editor
  namespace: mynamespace
roleRef:
  kind: ClusterRole
  name: argo
  apiGroup: rbac.authorization.k8s.io
EOF

What happened:

The pipeline pods run successfully but I get an error in the Jupyter server. Additionally, in the experiment section I see the run but no graph is displayed.

2021-02-10 08:39:54 run:114 [[INFO]] [TID=82upm9iyob] [/home/jovyan/data-vol-1/examples/base/candies_sharing.ipynb] Executing RPC function 'get_run(run_id=e5b4e73c-2709-41b0-af75-f9b9dcb372f2)'
2021-02-10 08:39:54 run:125 [[ERROR]] [TID=82upm9iyob] [/home/jovyan/data-vol-1/examples/base/candies_sharing.ipynb] RPC function 'get_run' raised an unhandled exception
Traceback (most recent call last):
...
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Wed, 10 Feb 2021 08:39:54 GMT', 'x-envoy-upstream-service-time': '1', 'server': 'envoy', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Failed to authorize the request.: Failed to authorize with the run Id.: Failed to get namespace from run id.: InternalServerError: Failed to get run: invalid connection: invalid connection","message":"Failed to authorize the request.: Failed to authorize with the run Id.: Failed to get namespace from run id.: InternalServerError: Failed to get run: invalid connection: invalid connection","code":13,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Internal Server Error","error_details":"Failed to authorize the request.: Failed to authorize with the run Id.: Failed to get namespace from run id.: InternalServerError: Failed to get run: invalid connection: invalid connection"}]}

What did you expect to happen:

I receive a status feedback when the pipeline runs successfully and see the graph in the experiment section.

Environment:

How did you deploy Kubeflow Pipelines (KFP)?

Full Kubeflow Deployment on an on-premise cluster.

KFP version: 1.0.4

KFP SDK version: kfp 1.4.0, kfp-pipeline-spec 0.1.5, kfp-server-api 1.3.0

Anything else you would like to add:

kfp pipeline list executed on the notebook server.

kfp pipeline list
+--------------------------------------+-------------------------------------------------+---------------------------+
| Pipeline ID                          | Name                                            | Uploaded at               |
+======================================+=================================================+===========================+
| 9f04bfad-cad5-4967-bad3-bf7e2f0fe156 | candies-sharing-0s70d                           | 2021-02-10T08:39:05+00:00 |
+--------------------------------------+-------------------------------------------------+---------------------------+

Are there any plans to automatically create the currently manually created ServiceRoleBinding, EnvoyFilter and the RoleBinding in the future?

/kind bug

@rui5i rui5i added the kind/misc types beside feature and bug label Feb 11, 2021
@anneum
Copy link
Author

anneum commented Feb 15, 2021

I have checked if the run has been correctly inserted into the mlpipeline database. I see both the run and the correct namespace.

kubectl -nkubeflow exec -it mysql-7694c6b8b7-nxn2h -- bash
root@mysql-7694c6b8b7-nxn2h:/# mysql
mysql> use mlpipeline;
mysql> select uuid, DisplayName, namespace, ServiceAccount from run_details where uuid = 'e5b4e73c-2709-41b0-af75-f9b9dcb372f2';
+--------------------------------------+---------------------------------+-----------+----------------+
| uuid                                 | DisplayName                     | namespace | ServiceAccount |
+--------------------------------------+---------------------------------+-----------+----------------+
| e5b4e73c-2709-41b0-af75-f9b9dcb372f2 | candies-sharing-0s70d_run-13csv | mynamespace | default-editor |
+--------------------------------------+---------------------------------+-----------+----------------+

@Bobgy
Copy link
Contributor

Bobgy commented Feb 26, 2021

Long term solution should be #5138

@Bobgy Bobgy closed this as completed Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind/misc types beside feature and bug
Projects
None yet
Development

No branches or pull requests

4 participants