Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kfp _generate_kaniko_spec permissions under workload identity #2814

Closed
louisvernon opened this issue Jan 7, 2020 · 9 comments · Fixed by #3419
Closed

kfp _generate_kaniko_spec permissions under workload identity #2814

louisvernon opened this issue Jan 7, 2020 · 9 comments · Fixed by #3419
Assignees
Labels

Comments

@louisvernon
Copy link

louisvernon commented Jan 7, 2020

What happened:

Testing the pipelines notebook examples under Kubeflow 0.7 using Workload Identity on Google Kubernetes Engine.

The container_build example fails because _generate_kaniko_spec creates a pod running under the default Kubernetes service account which has no associated/bound GCP service account and therefore does not have necessary permissions to interact with GCS/GCR.

The generated kaniko pod logs show a 403 error like the following:

Error: error resolving source context: googleapi: got HTTP response code 403 with body: ... does not have storage.objects.get access to ...

What did you expect to happen:

_generate_kaniko_spec creates a pod spec which executes successfully.

What steps did you take:

When using KFP under Kubeflow 0.7 with workload identity you can resolve the issue by modifying _container_builder.py to generate a pod spec which runs under the default-editor service account as that is bound to a GCP service account with suitable privileges.

'serviceAccountName': 'default'}

Anything else you would like to add:

How I am starting the notebook to get things working:

!python3 -m pip install 'kfp>=0.1.31' --user --quiet
!sed -i "s/'serviceAccountName': 'default'/'serviceAccountName': 'default-editor'/" home/jovyan/.local/lib/python3.6/site-packages/kfp/containers/_container_builder.py
import sys
sys.path.append("/home/jovyan/.local/lib/python3.6/site-packages")
@Bobgy
Copy link
Contributor

Bobgy commented Jan 9, 2020

Thanks for reporting, this is tracked in #1691 (comment)

@Bobgy Bobgy added the status/triaged Whether the issue has been explicitly triaged label Feb 27, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Apr 1, 2020

I'm starting to work on this. Here are my initial thoughts:

Full Kubeflow will be different from Standalone:

  • for Full Kubeflow, we should use default-editor service account in user namespaces
  • for Standalone, we should have a dedicated service account, probably named kubeflow-pipelines-container-builder-sa
  • If we take anthos into consideration, the old user-gcp-sa secret should also be supported (no request yet).

So the question is how can we design the interface, so that users can have the full configurability for their own envs, while provide an easy on boarding experience for the three default environments above.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 1, 2020

When I read ContainerBuilder code, one thing was especially confusing to me.

We only export

def build_image_from_working_dir(image_name: str = None, working_dir: str = None, file_filter_re: str = r'.*\.py', timeout: int = 1000, base_image: str = None, builder: ContainerBuilder = None) -> str:
methods in _build_image_api.py, but that method can take a ContainerBuilder as an argument, so I expected ContainerBuilder should be a public interface too.

While ContainerBuilder is actually not exported in the module. It needs to be imported by from kfp.containers._container_builder import ContainerBuilder, that includes a file with _ prefix, so it's supposed to be an internal file.

The interface design seems contradicting, should we expect users instantiate their own ContainerBuilder or not?

@Bobgy
Copy link
Contributor

Bobgy commented Apr 1, 2020

@Ark-kun @numerology Do you have any insights on this?

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 1, 2020

The interface design seems contradicting, should we expect users instantiate their own ContainerBuilder or not?

You're correct. There is a contradiction.
When I've designed the interface I've envisioned a need to customize the ContainerBuilder objects. However I was explicitly asked not to expose the class publicly. So the class is not imported into the public namespace.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 1, 2020

BTW, the default ContainerBuilder object is exposed as kfp.containers.default_image_builder.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 1, 2020

@Bobgy Is there any way we can exclude the service account name from the spec and still have it working?

@Bobgy
Copy link
Contributor

Bobgy commented Apr 1, 2020

The interface design seems contradicting, should we expect users instantiate their own ContainerBuilder or not?

You're correct. There is a contradiction.
When I've designed the interface I've envisioned a need to customize the ContainerBuilder objects. However I was explicitly asked not to expose the class publicly. So the class is not imported into the public namespace.

Thanks for the context! What was the reason then? It seems to solve this issue, we must allow users to control ContainerBuilder arguments.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 1, 2020

@Bobgy Is there any way we can exclude the service account name from the spec and still have it working?

Do you mean adding some runtime resolution logic? I think we need a way to figure out how KFP was deployed, is it Kubeflow/AI platform or standalone?

It might be good if e.g. we have a configmap in the cluster with related information on how to launch kaniko builders. Will sdk have enough permissions to query that configmap? Also, which namespace should that configmap live? I think for Kubeflow multi user mode, it should live in user's namespace. So we must expose ContainerBuilder and let users choose their own namespaces.

Then another problem comes, how can we configure a configmap in every user's namespace. We must have a controller to support that. It seems too much overhead to get this problem solved.

Another solution I just thought of, maybe we can add an kfp api server endpoint to tell what is the recommended configuration for container builder, so that we can always configure api server with response good as default value for different environments. Again, this requires change to other systems. That's probably too much and might be fragile.

I think the easiest solution is to just let user specify a enum to tell which deployment they are using and use different defaults accordingly. While we provide a way to override the container's spec by themselves, so it's extensible in any custom environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants