Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing permissions for alb-load-balancer-controller sa #775

Closed
paravatha opened this issue Jul 23, 2023 · 4 comments · Fixed by #777 or #779
Closed

Missing permissions for alb-load-balancer-controller sa #775

paravatha opened this issue Jul 23, 2023 · 4 comments · Fixed by #777 or #779
Labels
bug Something isn't working

Comments

@paravatha
Copy link

Describe the bug
For Cognito deployment option, one of the reasons ALB fails to provision is due to the below missing permissions in this iam_alb_ingress_policy.json

"elasticloadbalancing:AddTags"

Updating it manually via AWS console and deleting/restarting the alb-load-balancer-controler pods fixes it

Steps To Reproduce

  1. Deploy Kubeflow using this Cognito Terrraform guide https://awslabs.github.io/kubeflow-manifests/docs/deployment/cognito/guide-terraform/
  2. Check the logs using
    kubectl -n kube-system logs $(kubectl get pods -n kube-system --selector=app.kubernetes.io/name=aws-load-balancer-controller --output=jsonpath={.items..metadata.name})
  3. You will see errors showing the IAM roles is missing elasticloadbalancing:AddTags permissions
  4. Keep checking if ALB is created, ADDRESS remains empty a while and deployment fails eventually
    kubectl get ingress -n istio-system

Expected behavior
A clear and concise description of what you expected to happen.

Environment

  • Kubernetes version: 1.25
  • Using EKS (yes/no), if so version? Yes
  • Kubeflow version : 1.7
  • AWS build number : v1.7.0-aws-b1.0.2
  • AWS service targeted (S3, RDS, etc.)

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Similar issues are encountered by others (possibly due to regression or version incompatibility issue between EKS/k8s and aws-load-balancer-controller)
kubernetes-sigs/aws-load-balancer-controller#3044
aws-ia/terraform-aws-eks-blueprints#1683

@paravatha paravatha added the bug Something isn't working label Jul 23, 2023
@surajkota
Copy link
Contributor

Thanks for reporting the issue, this is a breaking change. We will need to release a patch

@ananth102
Copy link
Contributor

ananth102 commented Aug 8, 2023

What is the specific error you are encountering and which region are you deploying to?

@ananth102
Copy link
Contributor

I was able to successfully deploy Kubeflow on Cognito with Kustomize/terraform. Can you let me know when you deployed it.

ananth102 added a commit that referenced this issue Aug 15, 2023
**Which issue is resolved by this Pull Request:**
Resolves #775

**Description of your changes:**

Added permissions from the upstream reccomendation

Based on
https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.7/docs/install/iam_policy.json

**Testing:**
- [ ] Unit tests pass
- [ ] e2e tests pass - manual cognito test passed

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@ananth102 ananth102 reopened this Aug 15, 2023
@ananth102 ananth102 mentioned this issue Aug 15, 2023
2 tasks
ananth102 added a commit that referenced this issue Aug 28, 2023
**Which issue is resolved by this Pull Request:**
Resolves #775

**Description of your changes:**

Upgrade to v5 blueprints for the eks addons

Major changes:

1. v5 does not have an option to enable the ebs csi driver, will need to
do with the help of another module
2. v5 does not have an option for enabling the nvidia plugin, an
operator is used instead.
3. V5/V4 parameters are different.

**Testing:**
- [ ] Unit tests pass
- [x] e2e tests pass - Cognito, rds-s3-static, rds/s3-irsa passes,
efs/fsx look fine manually. Need to test nvidia.
- Details about new tests (If this PR adds a new feature)
- Details about any manual tests performed - GPU testing

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@ananth102
Copy link
Contributor

ananth102 commented Sep 5, 2023

We have released v1.7.0-b1.0.3 which should fix this

rakuto pushed a commit to tne-ai/kubeflow-manifests that referenced this issue Sep 27, 2023
**Which issue is resolved by this Pull Request:**
Resolves awslabs#775

**Description of your changes:**

Upgrade to v5 blueprints for the eks addons

Major changes:

1. v5 does not have an option to enable the ebs csi driver, will need to
do with the help of another module
2. v5 does not have an option for enabling the nvidia plugin, an
operator is used instead.
3. V5/V4 parameters are different.

**Testing:**
- [ ] Unit tests pass
- [x] e2e tests pass - Cognito, rds-s3-static, rds/s3-irsa passes,
efs/fsx look fine manually. Need to test nvidia.
- Details about new tests (If this PR adds a new feature)
- Details about any manual tests performed - GPU testing

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants