Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Route NAT64 to NAT Gateway in IPv6 public topology #12843

Merged
merged 2 commits into from
Nov 28, 2021

Conversation

johngmyers
Copy link
Member

No description provided.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 27, 2021
@k8s-ci-robot k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Nov 27, 2021
@johngmyers
Copy link
Member Author

/test pull-kops-e2e-ipv6-calico

@johngmyers
Copy link
Member Author

/cc @olemarkus

@johngmyers johngmyers changed the title WIP Route NAT64 to NAT Gateway in IPv6 public topology Route NAT64 to NAT Gateway in IPv6 public topology Nov 27, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 27, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olemarkus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 27, 2021
@olemarkus
Copy link
Member

/lgtm cancel

There seems to be quite a few timeouts in that calico test.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2021
@johngmyers
Copy link
Member Author

It might need DNS64

@hakman
Copy link
Member

hakman commented Nov 27, 2021

There seems to be quite a few timeouts in that calico test.

This was missing DNS64, that just merged. Calico IPv6 test failed as any other test because of IRSA. Let's see it #12766 fixes it.
/test pull-kops-e2e-ipv6-calico

@olemarkus
Copy link
Member

Yep. That should be merged now, so I'll retest when the current run fails.

@hakman
Copy link
Member

hakman commented Nov 27, 2021

Doesn't look like it worked...

@johngmyers
Copy link
Member Author

/test pull-kops-e2e-ipv6-calico

@johngmyers
Copy link
Member Author

NAT64 works on my cluster. Rebasing and retesting.

@hakman
Copy link
Member

hakman commented Nov 27, 2021

Did you try creating an EBS volume on your cluster, just to confirm that it works?

@johngmyers
Copy link
Member Author

I don't have IRSA turned on for this cluster. I can hit the global and regional STS endpoints, though.

Ping through NAT64 to one public IP wasn't working, though. I need to bisect that a bit.

@johngmyers
Copy link
Member Author

johngmyers commented Nov 27, 2021

I can ping the IPv6 addresses of nodes in other AZs and an IPv6 address in the public Internet, both from the node and from a non-host-network pod. I cannot ping through NAT64, though I can connect with HTTPS. traceroute works fine, traceroute -I doesn't get any responses to or through NAT64.

@johngmyers
Copy link
Member Author

The retest failed as well. I cannot test IRSA in my environment without going through extraordinary effort. Could someone else try this out?

@johngmyers
Copy link
Member Author

/test pull-kops-e2e-ipv6-cilium

@k8s-ci-robot
Copy link
Contributor

@johngmyers: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kops-e2e-ipv6-calico 0d2e9dc link true /test pull-kops-e2e-ipv6-calico
pull-kops-e2e-ipv6-cilium 0d2e9dc link true /test pull-kops-e2e-ipv6-cilium

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@johngmyers
Copy link
Member Author

I think we might want to land this as-is, then separately look into why IRSA is still failing.

@hakman
Copy link
Member

hakman commented Nov 28, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 28, 2021
@hakman
Copy link
Member

hakman commented Nov 28, 2021

Issue with IRSA comes from a bug in AWS EBS CSI driver. Setting AWS_EC2_ENDPOINT also overrides the STS endpoint:

2021/11/28 02:57:37 DEBUG: Validate Response sts/AssumeRoleWithWebIdentity failed, attempt 0/8, error SerializationError: failed to unmarshal error message
	status code: 400, request id: e42074ab-cfa5-4fb5-b18f-5202d253f4a3
caused by: UnmarshalError: failed to unmarshal error message
	00000000  3c 3f 78 6d 6c 20 76 65  72 73 69 6f 6e 3d 22 31  |<?xml version="1|
00000010  2e 30 22 20 65 6e 63 6f  64 69 6e 67 3d 22 55 54  |.0" encoding="UT|
00000020  46 2d 38 22 3f 3e 0a 3c  52 65 73 70 6f 6e 73 65  |F-8"?>.<Response|
00000030  3e 3c 45 72 72 6f 72 73  3e 3c 45 72 72 6f 72 3e  |><Errors><Error>|
00000040  3c 43 6f 64 65 3e 4e 6f  53 75 63 68 56 65 72 73  |<Code>NoSuchVers|
00000050  69 6f 6e 3c 2f 43 6f 64  65 3e 3c 4d 65 73 73 61  |ion</Code><Messa|
00000060  67 65 3e 54 68 65 20 72  65 71 75 65 73 74 65 64  |ge>The requested|
00000070  20 76 65 72 73 69 6f 6e  20 28 32 30 31 31 2d 30  | version (2011-0|
00000080  36 2d 31 35 29 20 6f 66  20 73 65 72 76 69 63 65  |6-15) of service|
00000090  20 41 6d 61 7a 6f 6e 45  43 32 20 64 6f 65 73 20  | AmazonEC2 does |
000000a0  6e 6f 74 20 65 78 69 73  74 3c 2f 4d 65 73 73 61  |not exist</Messa|
000000b0  67 65 3e 3c 2f 45 72 72  6f 72 3e 3c 2f 45 72 72  |ge></Error></Err|
000000c0  6f 72 73 3e 3c 52 65 71  75 65 73 74 49 44 3e 65  |ors><RequestID>e|
000000d0  34 32 30 37 34 61 62 2d  63 66 61 35 2d 34 66 62  |42074ab-cfa5-4fb|
000000e0  35 2d 62 31 38 66 2d 35  32 30 32 64 32 35 33 66  |5-b18f-5202d253f|
000000f0  34 61 33 3c 2f 52 65 71  75 65 73 74 49 44 3e 3c  |4a3</RequestID><|
00000100  2f 52 65 73 70 6f 6e 73  65 3e                    |/Response>|

caused by: unknown error response tag, {{ Response} []}
2021/11/28 02:57:37 DEBUG: Request sts/AssumeRoleWithWebIdentity Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: api.ec2.us-east-1.aws
User-Agent: aws-sdk-go/1.40.4 (go1.17.3; linux; amd64) exec-env/aws-ebs-csi-driver-v1.5.0
Content-Length: 1291
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Accept-Encoding: gzip


-----------------------------------------------------
2021/11/28 02:57:37 DEBUG: Response sts/AssumeRoleWithWebIdentity Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 400 Bad Request
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Content-Type: text/xml;charset=UTF-8
Date: Sun, 28 Nov 2021 02:57:36 GMT
Server: AmazonEC2
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: accept-encoding
X-Amzn-Requestid: af2a0511-174d-4e80-8198-f83f2ca21a79

@k8s-ci-robot k8s-ci-robot merged commit 24318f8 into kubernetes:master Nov 28, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 28, 2021
@johngmyers johngmyers deleted the nat64-publlic branch November 28, 2021 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants