Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[manifest] Added manifest for deploying on aws using s3 #2633

Merged
merged 6 commits into from
Dec 12, 2019

Conversation

eterna2
Copy link
Contributor

@eterna2 eterna2 commented Nov 20, 2019

There are a number of requests both in the issues (#2627 #1610 #1131) as well as in the slack channel on how to configure kubeflow pipeline on AWS (namely interacting with S3 buckets, particularly for pipeline-ui).

This PR adds 2 kustomize overlays:

  • accesskey: overlay that reference a secret with aws credentials
  • iam: overlay that annotate the services with the iam role (assuming kube2iam or similar are already deployed in the cluster)

The more common configurations (e.g. bucket, folder) are also exposed inside params.env. archiveLogs for argo is also set to true so that the logs are persisted and can be retrieved by the ui.

A README is also provided in the root dir of the overlays.


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @eterna2. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

containers:
- name: ml-pipeline-ui
env:
- name: AWS_ACCESS_KEY_ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IronPan In future we should probably switch to getting the keys from ConfigMap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

access_key_id and access_key sounds like they belong to a secret. I think this is appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

access_key_id and access_key sounds like they belong to a secret.

AFAIK, they're not the secret itself, more like secret name.

P.S. If the Frontend currently gets those configurations from the ENV, then I guess this PR just reflects that. We should move them to configMap when we have a chance.

key: $(awsSecretKeySecretKey)
- name: ARGO_ARCHIVE_LOGS
value: "true"
- name: ARGO_ARCHIVE_ARTIFACTORY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bobgy @IronPan I wonder why UI is not getting these values from ConfigMap. Probably the configmap did not exist at the time.

Copy link
Contributor

@Bobgy Bobgy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

I can see your efforts in this and I believe this could be helpful to many aws users.

My concerns are:
This repo doesn't have test infra to verify these configurations keep working. Therefore, no one can maintain/guarantee these are still up-to-date after future changes.

I suggest

  1. make a repo of your own for these manifests and pin a pipeline release version as base, so that it doesn't break
  2. add a readme page here pointing to your repo mentioning it's a variant maintained by you/community, it may not always be up-to-date with latest pipelines, but it is a good reference + any one can contribute to it after they figured out how to patch a later pipelines release.

What do you think?
@eterna2 @Ark-kun @IronPan

containers:
- name: ml-pipeline-ui
env:
- name: AWS_ACCESS_KEY_ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

access_key_id and access_key sounds like they belong to a secret. I think this is appropriate.

},
{
"name": "AWS_REGION",
"value": "ap-southeast-1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Missed this.

@eterna2
Copy link
Contributor Author

eterna2 commented Nov 22, 2019

Good idea. I will do that and revert this commit and update the readme instead.

Copy link
Member

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, my suggestion is to provide flexibility to support k2iam, EKS IAM for pod feature or other iam solutions

template:
metadata:
annotations:
iam.amazonaws.com/role: $(awsIAMRole)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the key configurable? iam.amazonaws.com/role is for k2iam users. As EKS release feature https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/, we have lots of users use eks.amazonaws.com/role-arn as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kustomize don't allow templating for keys yet - i.e. I cannot do $(key): $(value) for annotation.

a possible workaround is to use commonAnnotations (actually this is what I used for my deployment, cuz I was lazy. I gave all kfp deployments the same iam role)

@eterna2
Copy link
Contributor Author

eterna2 commented Nov 23, 2019

Hey all,

Created a new repo: https://github.com/e2fyi/kubeflow-aws to hold the manifest. Streamlined the manifest too.

  • most of the configurations are done in base
  • variant:accesskey just patches the various deployment with the access key secret
  • variant:iam only patches the tensorboard configmap, and annotations are added using commonAnnotations

I also added a short write-up on the various components of kubeflow pipelines in my repo. Probably useful for pple to have some sense where and what to look at if they want to do something. Maybe I shld put this somewhere too - probably under developer notes?

service description
argo main workhorse to run the pipelines
ml-pipeline-apiserver primary endpoint to interact with kubeflow pipelines APIs (e.g. get runs, experiments, etc)
ml-pipeline-persistentagent multiple workers that performs various misc tasks - e.g. collect metrics, save resources to db, etc
ml-pipeline-scheduledworkflow controller to schedule workflows
ml-pipeline-ui serves the React frontend and nodejs backend - i.e. webapp for users to interact with pipelines
ml-pipeline-viewer-crd controller to manage viewers - i.e. pods serving tensorboard
ml-pipeline-visualizationserver endpoints to generate and serve visualizations for pipelines
metadata-server endpoints to record and retrieve metadata associated with the workflows
mysql primary database to store most of the pipeline resources and information
minio minio can either be deployed as a standalone object store service, or as a gateway to object store services like S3. This manifest will not deploy minio service as we will be connecting to s3 directly
s3 s3 will be used as the primary storage for workflow templates and artifacts generated by the pipelines (i.e. logs, metrics, ui-metrics, intermediate data, etc)

For this PR:

  • i reverted the previous commit (remove all the manifest)
  • Added folder env/aws and a README inside (easier for pple to spot if they are looking at the repo)
  • Update the manifest README with a small section under customization

provides a community-maintained manifest for deploying kubeflow pipelines on AWS
(with S3 as artifact store instead of minio).

TL;DR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also move these instructions into the new repo?
I think that makes it more self contained, and instruction + manifests are versioned together makes more sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably still have at least a small section about AWS and a pointer to the other repo. @IronPan WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I was thinking the section only introduces the repo.

@eterna2
Copy link
Contributor Author

eterna2 commented Dec 11, 2019

Ok. I have removed the unnecessary deployment instructions in the readme.

@Bobgy
Copy link
Contributor

Bobgy commented Dec 12, 2019

/lgtm
/approve

Thanks! this looks good

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Bobgy
Copy link
Contributor

Bobgy commented Dec 12, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot merged commit 5a0c2f4 into kubeflow:master Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants