-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] IAM roles are not passed on to MPI Launcher and worker pods #3043
Comments
@bimtauer thanks for filing this issue! It looks like this is not just and MPI job issue but should probably be propagated through to the pytorch and tensorflow plugins as well. So the As far as adding a service account this should be a very easy fix. We are already doing this in the ray and spark tasks. You should be able to call the Is this something you might be able to contribute? I can offer a speedy review :) |
@hamersaw thanks for looking into this. And yes we expect to run into the same issue with Spark jobs or any other complex manifest rendered by Flyte that itself declares own PodTemplates. My first thought went back to Service accounts are definitely needed as well - we use those to authenticate pods with Vault. But afaik kube2iam relies solely on a matching annotation, so for aws access we'd need those to be propagated down as well. I suppose this here is where the MPI spec is first put together. Maybe you can use your |
@bimtauer ah OK, sorry for some reason I read "IAM role" and replaced it with "ServiceAccount". But I think I better understand this now. Also, we recently improved the configuration of the spark plugin, so this is something that the community expects as we get deeper. The proposed fix for this seems to be relatively similar. I think we could get the IAM role annotations included relatively easy, but a better fix is a lot more work. Just to make sure I fully understand the scope here, we are proposing to do a few things:
I think all of this is certainly reasonable! Let me know if I'm understanding this correctly. |
@hamersaw Thanks for this summary, this sounds very much like what we need. One small point that still confuses me is when you would recommend using PodTemplates VS using the k8s plugin config fields. My understanding is that I can use PodTemplates to do anything I could have otherwise done in the k8s plugin config.yaml (i.e default annotations, labels, tolerations etc.) and much more - I can create PodTemplates namespace-specific which means I can create different defaults for each project and domain. If the above is right then I would prefer only using PodTemplates in the future, so no need for point 2. at least from us. |
@bimtauer so the default PodTemplate work was done because we believe it is a much better mechanism than exposing configuration through the k8s plugin. So you are 100% correct, we are hoping users slowly transition from k8s plugin config fields to entirely using PodTemplates. |
Fixed by flyteorg/flyteplugins#297 |
Describe the bug
An IAM role defined in the launch dialog, in a launchplan or in the k8s defaults will not be passed on to an MPI Job's launcher and worker pods and thus results in S3 errors if iam role authentication is used.
This also concerns the default IAM role if declared which according to my understanding is usually applied together with other defaults as part of AddObjectMetadata.
Good news is all annotations are properly set on the top level
MPIJob
manifest - however they are not set on the PodTemplates for the worker and launcher pods withing that generated manifest.Expected behavior
A default IAM role (defined as a default annotation) or an IAM role provided in the launch dialog should be passed down to an MPI Jobs worker and launcher pods
Additional context to reproduce
Run a workflow with an MPI task and specify a custom iam role upon execution.
Screenshots
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: