Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tsh aws ecr Internal Server Error #10475

Merged
merged 12 commits into from
Mar 11, 2022
Merged

Fix tsh aws ecr Internal Server Error #10475

merged 12 commits into from
Mar 11, 2022

Conversation

greedy52
Copy link
Contributor

@greedy52 greedy52 commented Feb 18, 2022

Fix #10467

Background/Problem

[aws cli/tsh client] -> [aws local proxy] -> [teleport app service] -> [aws]

Each AWS service has its own service endpoints/hostnames. Since the AWS client (e.g aws cli) has to send the request to our proxies by overriding endpoint-url, we loose which endpoint it suppose to send to.

To compensate that, the app service uses the service name and region info, found in the auth header, to resolve the real AWS endpoint. We currently call aws-sdk-go's EndpointFor for this.

However,

  1. The service name found in the auth header, aka SigningName (e.g. ecr), sometimes, can be different from the EndpointsID (e.g. api.ecr), which is the key used inside EndpointFor call.
  2. EndpointFor does not support all AWS services, by default.
  3. Some services may use the same SigningName, but they use different endpionts/hostnames

Changes

  • For 1, maintain a mapping to translate SigningName back to EndpointsID
  • For 2, allow ResolveUnknownService
  • not sure what to do with 3 yet.

Testing

Try tsh aws with different services.

  • [pass] tsh aws sts get-caller-identity
  • [pass] tsh aws s3 ls
  • [pass] tsh aws s3api list-buckets
  • [fail] tsh aws s3control list-jobs --account-id 1234 (failed as using special endpoint with account id)
Could not connect to the endpoint URL: "https://1234.localhost:51843/v20180820/jobs"
  • [pass] tsh aws ecr describe-repositories
  • [pass] tsh aws ecr-public describe-repositories --region us-east-1 (failed for ca-central-1)
  • [pass] tsh aws elb describe-load-balancers
  • [pass] tsh aws elbv2 describe-load-balancers
  • [pass] tsh aws dynamodb list-tables
  • [pass] tsh aws ec2 describe-instances
  • [fail] tsh aws dynamodbstreams list-streams (failed because problem 3)
  • [pass] tsh aws iot-data list-retained-messages
  • [pass] tsh aws iotfleethub list-applications
  • [pass] tsh aws rds describe-db-instances
  • [pass] tsh aws redshift describe-clusters

@greedy52 greedy52 changed the title AWS endpoints ID vs service name mapping Fix tsh aws ecr error: AWS endpoints ID vs service name mapping Feb 18, 2022
@greedy52 greedy52 changed the title Fix tsh aws ecr error: AWS endpoints ID vs service name mapping Fix tsh aws ecr Internal Server Error Feb 28, 2022
@greedy52 greedy52 self-assigned this Feb 28, 2022
@greedy52 greedy52 added the aws Used for AWS Related Issues. label Feb 28, 2022
@greedy52 greedy52 marked this pull request as ready for review February 28, 2022 18:23
@greedy52 greedy52 removed the request for review from nklaassen February 28, 2022 18:25
Copy link
Contributor

@smallinsky smallinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix LGTM , though as you mentioned this is not comprehensive solution. For example right now the streams.dynamodb aws API endpoint is not properly resolved because the service name for dynamo db is still dynamodDB and there is no easy way to distinguish where request should be forwarded.

I was thinking about how right now the aws proxy is implemented and maybe the current solution can be improved.

For example right now we are using --enbpoint-url aws cli flag that overwrite the AWS api and API url needs to be reconstructed from metadata included in Credential header.

Screenshot 2022-03-01 at 11 29 55

Not sure but it seems that aws cli access the solution with PROXY_HTTPS can be leverages where HTTPS header is not modify and HTTPS proxy adds the Destination URL to Headers allowing teleport app service to reconstruct the real AWS
URL endpoint picked by aws tool.

Screenshot 2022-03-01 at 11 30 02

So our AWS ALPN proxy will behave like MITM proxy where request is modified and additional metadata are added designation URL in moved to header and real URL is replace with teleport proxy address. The connection can be TLS terminated with a cert generated on the fly based on clientHello.ServerName field
This approach will also simplify integration external aws client like terraform.

For sure this approach should be not related to this PR and discussed separately as AWS CLI access improvement.

@r0mant @greedy52 What do you think about HTTPS_PROXY approach ?

lib/srv/app/aws/endpoints_test.go Outdated Show resolved Hide resolved
@greedy52
Copy link
Contributor Author

greedy52 commented Mar 1, 2022

@smallinsky I've thought about HTTPS proxy. I think it will definitely make this case easier as there will be no need for endpoint "resolution". But it has its own problems with my understanding how HTTP/HTTPS_PROXY works

  • Similar to endpoint-url, it's up to the application implementation if they want to interpret the http_proxy env var or not. So they may use a different way or may not support it.
  • For some applications we can set SSL_CERT_FILE I believe for our CA. But we may have to install self-signed CA somewhere (e.g. at system level) for other applications.
  • Not sure what happens when there is a corporate proxy already set for HTTP/HTTPS_PROXY. Likely our proxy gets blocked.

I'd love to do a prototype on this and see how things work out. Maybe separately from this bug fix? If it works, server probably can support both mode (just an additional header check i guess)


alternatively, someone suggested instead of proxying, we can just call AssumeRole to get a temporary token for client to use directly.

@greedy52
Copy link
Contributor Author

greedy52 commented Mar 7, 2022

As discussed, I will try out the HTTPS_PROXY proxy mode separately (as part of tsh proxy aws feature). I believe this fix is necessary for existing and older clients regardless we will do the HTTPS_PROXY mode or not

@greedy52 greedy52 requested a review from smallinsky March 7, 2022 14:32
@greedy52
Copy link
Contributor Author

Friendly ping @gabrielcorado @jimbishopp. Thanks!

@greedy52 greedy52 enabled auto-merge (squash) March 11, 2022 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
application-access aws Used for AWS Related Issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"tsh aws ecr" gives 500 error
3 participants