kfp _generate_kaniko_spec permissions under workload identity #2814

louisvernon · 2020-01-07T22:20:01Z

What happened:

Testing the pipelines notebook examples under Kubeflow 0.7 using Workload Identity on Google Kubernetes Engine.

The container_build example fails because _generate_kaniko_spec creates a pod running under the default Kubernetes service account which has no associated/bound GCP service account and therefore does not have necessary permissions to interact with GCS/GCR.

The generated kaniko pod logs show a 403 error like the following:

Error: error resolving source context: googleapi: got HTTP response code 403 with body: ... does not have storage.objects.get access to ...

What did you expect to happen:

_generate_kaniko_spec creates a pod spec which executes successfully.

What steps did you take:

When using KFP under Kubeflow 0.7 with workload identity you can resolve the issue by modifying _container_builder.py to generate a pod spec which runs under the default-editor service account as that is bound to a GCP service account with suitable privileges.

pipelines/sdk/python/kfp/containers/_container_builder.py

Line 137 in 52b56e9

'serviceAccountName': 'default'}

Anything else you would like to add:

How I am starting the notebook to get things working:

!python3 -m pip install 'kfp>=0.1.31' --user --quiet
!sed -i "s/'serviceAccountName': 'default'/'serviceAccountName': 'default-editor'/" home/jovyan/.local/lib/python3.6/site-packages/kfp/containers/_container_builder.py
import sys
sys.path.append("/home/jovyan/.local/lib/python3.6/site-packages")

The text was updated successfully, but these errors were encountered:

Bobgy · 2020-01-09T00:51:51Z

Thanks for reporting, this is tracked in #1691 (comment)

Bobgy · 2020-04-01T01:17:31Z

I'm starting to work on this. Here are my initial thoughts:

Full Kubeflow will be different from Standalone:

for Full Kubeflow, we should use default-editor service account in user namespaces
for Standalone, we should have a dedicated service account, probably named kubeflow-pipelines-container-builder-sa
If we take anthos into consideration, the old user-gcp-sa secret should also be supported (no request yet).

So the question is how can we design the interface, so that users can have the full configurability for their own envs, while provide an easy on boarding experience for the three default environments above.

Bobgy · 2020-04-01T01:23:23Z

When I read ContainerBuilder code, one thing was especially confusing to me.

We only export

pipelines/sdk/python/kfp/containers/_build_image_api.py

Line 66 in 52b56e9

    
           def build_image_from_working_dir(image_name: str = None, working_dir: str = None, file_filter_re: str = r'.*\.py',  timeout: int = 1000, base_image: str = None, builder: ContainerBuilder = None) -> str:

methods in _build_image_api.py, but that method can take a ContainerBuilder as an argument, so I expected ContainerBuilder should be a public interface too.

While ContainerBuilder is actually not exported in the module. It needs to be imported by from kfp.containers._container_builder import ContainerBuilder, that includes a file with _ prefix, so it's supposed to be an internal file.

The interface design seems contradicting, should we expect users instantiate their own ContainerBuilder or not?

Bobgy · 2020-04-01T01:23:41Z

@Ark-kun @numerology Do you have any insights on this?

Ark-kun · 2020-04-01T01:29:14Z

The interface design seems contradicting, should we expect users instantiate their own ContainerBuilder or not?

You're correct. There is a contradiction.
When I've designed the interface I've envisioned a need to customize the ContainerBuilder objects. However I was explicitly asked not to expose the class publicly. So the class is not imported into the public namespace.

Ark-kun · 2020-04-01T01:31:07Z

BTW, the default ContainerBuilder object is exposed as kfp.containers.default_image_builder.

Ark-kun · 2020-04-01T01:32:27Z

@Bobgy Is there any way we can exclude the service account name from the spec and still have it working?

Bobgy · 2020-04-01T01:35:03Z

The interface design seems contradicting, should we expect users instantiate their own ContainerBuilder or not?

You're correct. There is a contradiction.
When I've designed the interface I've envisioned a need to customize the ContainerBuilder objects. However I was explicitly asked not to expose the class publicly. So the class is not imported into the public namespace.

Thanks for the context! What was the reason then? It seems to solve this issue, we must allow users to control ContainerBuilder arguments.

Bobgy · 2020-04-01T01:45:20Z

@Bobgy Is there any way we can exclude the service account name from the spec and still have it working?

Do you mean adding some runtime resolution logic? I think we need a way to figure out how KFP was deployed, is it Kubeflow/AI platform or standalone?

It might be good if e.g. we have a configmap in the cluster with related information on how to launch kaniko builders. Will sdk have enough permissions to query that configmap? Also, which namespace should that configmap live? I think for Kubeflow multi user mode, it should live in user's namespace. So we must expose ContainerBuilder and let users choose their own namespaces.

Then another problem comes, how can we configure a configmap in every user's namespace. We must have a controller to support that. It seems too much overhead to get this problem solved.

Another solution I just thought of, maybe we can add an kfp api server endpoint to tell what is the recommended configuration for container builder, so that we can always configure api server with response good as default value for different environments. Again, this requires change to other systems. That's probably too much and might be fragile.

I think the easiest solution is to just let user specify a enum to tell which deployment they are using and use different defaults accordingly. While we provide a way to override the container's spec by themselves, so it's extensible in any custom environment.

Ark-kun assigned Bobgy Jan 9, 2020

Bobgy mentioned this issue Jan 9, 2020

[GCP] pipelines supports workload identity #1691

Closed

Bobgy added area/deployment/kubeflow priority/p1 kind/bug labels Jan 9, 2020

Bobgy added the status/triaged Whether the issue has been explicitly triaged label Feb 27, 2020

Ark-kun assigned gaoning777 and Ark-kun Apr 1, 2020

Bobgy mentioned this issue Apr 2, 2020

[SDK] Make service account configurable for build_image_from_working_dir #3419

Merged

k8s-ci-robot closed this as completed in #3419 Apr 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kfp _generate_kaniko_spec permissions under workload identity #2814

kfp _generate_kaniko_spec permissions under workload identity #2814

louisvernon commented Jan 7, 2020 •

edited

Loading

Bobgy commented Jan 9, 2020

Bobgy commented Apr 1, 2020

Bobgy commented Apr 1, 2020

Bobgy commented Apr 1, 2020

Ark-kun commented Apr 1, 2020

Ark-kun commented Apr 1, 2020

Ark-kun commented Apr 1, 2020

Bobgy commented Apr 1, 2020

Bobgy commented Apr 1, 2020

kfp _generate_kaniko_spec permissions under workload identity #2814

kfp _generate_kaniko_spec permissions under workload identity #2814

Comments

louisvernon commented Jan 7, 2020 • edited Loading

Bobgy commented Jan 9, 2020

Bobgy commented Apr 1, 2020

Bobgy commented Apr 1, 2020

Bobgy commented Apr 1, 2020

Ark-kun commented Apr 1, 2020

Ark-kun commented Apr 1, 2020

Ark-kun commented Apr 1, 2020

Bobgy commented Apr 1, 2020

Bobgy commented Apr 1, 2020

louisvernon commented Jan 7, 2020 •

edited

Loading