Persist IPAM state to local file and use across restarts #972

anguslees · 2020-05-12T09:11:24Z

Persist IPAM state to a file in /var/run (by default), and use this to
recover state across a restart. Note no state needs to be preserved
across a reboot, since all containers are also restarted.

Removes need[*] for docker/CRI and Kubernetes API from ipamd.

[*] But not "use" of docker/CRI :(

CRI is necessary for this release to handle upgrades from earlier
versions - without requiring a reboot of the node. We can
drop CRI for real in release after release that contains this PR.
The CNI K8S_POD_* arguments are still passed in the gRPC request(s)
to ipamd. It is expected that pod name/namespace will be necessary
at some point (to fetch pod annotations).

mogren

Thanks for taking a fresh look at this!

pkg/ipamd/rpc_handler.go

pkg/ipamd/datastore/data_store.go

anguslees · 2020-05-20T01:03:59Z

I think this is ready for review fwiw.

Open questions:

upgrade migration. Assuming we don't want to reboot the node, then we need a way to migrate the existing containerID<->IPs allocations into the new persistent store (or some other strategy).
Verify what other CRIs do with CNI ContainerID arg vs K8S_POD_INFRA_CONTAINER_ID. I suspect we should switch to the former.

anguslees · 2020-05-20T01:05:51Z

cc #714

anguslees · 2020-05-22T07:55:16Z

Ok, there's now zero references to the CNI K8S_POD_* args in the codebase, and the IPAM state is stored against (network name, container ID, interface name) as recommended by CNI spec.

Also: I've audited the entire internet (dockershim, containerd, cri-o) and verified that every implementation of k8s/CNI sets "Container ID" to the same as "K8S_POD_INFRA_CONTAINER_ID". I asked on #sig-network and found no surprises there either.

I think this leaves:

upgrade (without node reboot). I'll have to think about this. It might mean restoring the CRI code for at least another minor release.

anguslees · 2020-06-04T02:39:20Z

Sigh.

Ok I've added back passing K8S_* from CNI to ipamd, even though they are then ignored. It's highly likely that we'll need to fetch (eg) pod annotations in the future, and this will make that future upgrade easier.

Also upgrade is now done by restoring (some) of the previous CRI code. Now, if the ipamd checkpoint file is missing (ie: not-found -> upgrade or reboot) then we fallback to querying CRI to discover any in-use Pod IPs. In the next release after this we'll be able to remove that and only rely on the checkpoint file (ie: not-found -> reboot only). For future reference, the code that should be removed is: everything covered by BackfillMissingCheckpointFromCRI constant, and the CRI socket references in the k8s manifest.

I have also updated PR description at the top to match current PR.

mogren

Looks awesome at a first glance, will take a closer look in the morning.

mogren · 2020-06-04T05:53:14Z

cmd/routed-eni-cni-plugin/cni.go

-
-	// Type is the plugin type
-	Type string `json:"type"`
+	types.NetConf


mogren · 2020-06-04T05:56:46Z

config/master/manifests.jsonnet

+        resources: ["eniconfigs"],
+        verbs: ["get", "list", "watch"],


This could be a separate PR... but I'm ok with updating it here. Will resolve #846 I presume.

Thumbs-up this comment if you want me to break it out into a separate PR. This whole PR basically started from being surprised by the k8s manifest, and I worked backwards :P

pkg/ipamd/rpc_handler.go

mogren · 2020-06-04T06:04:48Z

pkg/ipamd/datastore/data_store.go

+// AssignedIPv4Addresses is the number of IP addresses already assigned
+func (p *ENIPool) AssignedIPv4Addresses() int {


Nit: Slightly confusing to have two functions named AssignedIPv4Addresses(), one on ENIIPPool and one on ENIPool. Even the names of those structs are a bit confusing I think...

Yes. There are a few function names that are duplicated - with the idea that there are three nested levels AddressInfo < ENI (ENIIPPool) < all ENIs (ENIPool) and the methods sum/aggregate the similar method across the lower level.

I completely agree that ENIIPPool and ENIPool are terribly confusing names. I was trying to avoid changing the original type names, even though I knew I should do so :P I will rename ENIIPPool to ENI so it clearly suggests a single ENI (which is what it represents). (Edit: Done.)

pkg/ipamd/datastore/data_store.go

mogren

Awesome and a long due clean up, thanks a lot @anguslees! Will give others a chance to take a look at this as well before merging.

mogren · 2020-06-05T03:20:24Z

pkg/ipamd/datastore/data_store.go

-	// AssignedIPv4Addresses is the number of IP addresses already been assigned
-	AssignedIPv4Addresses int


Getting rid of this cleans up another of my TODOs... 😄

mogren · 2020-06-05T03:27:03Z

pkg/ipamd/datastore/data_store.go

+		// Assume that no file == no containers are
+		// currently in use, eg a fresh reboot just
+		// cleared everything out.  This is ok, and a
+		// no-op.


I am not totally sure that we can ever stop checking the CRI. What if a user removes the file in /var/run and then restart the CNI while pods are running?

Sure .. but at some point we need to stop trying to outcompete malicious users. (What if we continue checking CRI, but they remove the CRI socket or replace kubelet binary with a modified one that does something odd?)

After the upgrade to this CNI, I think the only way to be out of sync is if the kubelet is behaving incorrectly (or deliberate user intervention, as you suggest). In the worst case, a reboot will reset everything (kubelet, CRI, and ipam.json) to empty.

If you're still concerned, we should just drop most of this PR (I'm ok with that). There's no point checkpointing to disk and still querying CRI every time. The entire motivation for this PR was to (eventually) remove the need to expose CRI endpoint to the pod.

True. I'd really like to get rid of having to rely on the CRI. It's definitely better not to treat worker nodes as pets. 😄

mogren · 2020-06-05T03:28:53Z

pkg/ipamd/datastore/data_store.go

+	if data.Version != CheckpointFormatVersion {
+		return fmt.Errorf("datastore: unknown backing store format (%s != %s) - wrong CNI/ipamd version? (Rebooting this node will restart local pods and probably help)", data.Version, CheckpointFormatVersion)
+	}


Could we rebuild it from the CRI for this case instead?

Yes, but that means we're at war with our future selves.

The point of this format version check is so we can introduce a future incompatible change, and make sure that we don't silently ignore whatever that incompatibility is. If we do silently ignore it here, then we'll need to invent another mechanism to make sure we don't. Repeat. (Incidentally, if CRI is sufficient/acceptable, then we never needed the checkpoint in the first place).

mogren · 2020-06-05T03:43:01Z

pkg/ipamd/datastore/data_store.go

+	if addr.Assigned() {
+		panic("addr already assigned")
+	}


I know this shouldn't happen, but should we really panic here and restart the whole pod?

I can:

return an error at runtime, which (eventually) results in the pod restarting as well.

ignore it and overwrite an existing pod ip

ignore it and incorrectly set up the new pod

remove the check so we don't see a panic() in the code, document the expected invariant, and trust that callers never invoke this function for an already assigned ipamKey

(current version) assert the expected invariant explicitly in code, and thus know that no callers invoke this function for an already assigned ipamKey.

Which would you like?

The safest would be to panic for sure, since we should never be in this state. It would give a strong signal that something is very wrong in the current state of the aws-node pod.

mogren · 2020-06-05T03:45:54Z

pkg/ipamd/introspect.go

@@ -84,7 +84,6 @@ func (c *IPAMContext) setupIntrospectionServer() *http.Server {
 	serverFunctions := map[string]func(w http.ResponseWriter, r *http.Request){
 		"/v1/enis":                      eniV1RequestHandler(c),
 		"/v1/eni-configs":               eniConfigRequestHandler(c),
-		"/v1/pods":                      podV1RequestHandler(c),


We probably should update the log collector script as well if we are removing this introspection data:

https://github.com/awslabs/amazon-eks-ami/blob/master/log-collector-script/linux/eks-log-collector.sh#L79

awslabs/amazon-eks-ami#487 - in draft until this PR is finalised.

mogren · 2020-06-05T03:51:17Z

pkg/ipamd/ipamd.go

+
+	// Specify where ipam should persist its current IP<->container allocations.
+	envBackingStorePath     = "AWS_VPC_K8S_CNI_BACKING_STORE"
+	defaultBackingStorePath = "/var/run/aws-routed-eni/ipam.json"


I was never a fan of the aws-routed-eni folder name for logs, but I can see why you would like to keep them the same. That said, we already have a complete mix of names in both binaries and folders:aws-cni for the CNI, aws-k8s-agent for ipamd, aws-node for the daemonset...

Is /var/run/aws-node/ipamd.json better?

Persist IPAM state to a file in /var/run (by default), and use this to recover state across a restart. Note no state needs to be preserved across a reboot, since all containers are also restarted. Removes need[*] for docker/CRI and Kubernetes API from ipamd. [*] But not "use" of docker/CRI :( - CRI is necessary for this release to handle upgrades from earlier versions - without requiring a reboot of the node. We can drop CRI for real in release after release that contains this PR. - The CNI K8S_POD_* arguments are still passed in the gRPC request(s) to ipamd. It is expected that pod name/namespace _will_ be necessary at some point (to fetch pod annotations).

L-IPAMD now checkpoints IPAM state to a file. Collect /var/run/aws-node/ipam.json file (if present) for debugging. See also aws/amazon-vpc-cni-k8s#972

mogren · 2020-06-05T17:39:16Z

cmd/aws-k8s-agent/main.go

-	discoverController := k8sapi.NewController(kubeClient)
-	go discoverController.DiscoverLocalK8SPods()
-


Getting rid of this, and just reading it from the CRI socket will resolve #711 and replace #738 as well...

jayanthvn

Looks good :)

L-IPAMD now checkpoints IPAM state to a file. Collect /var/run/aws-node/ipam.json file (if present) for debugging. See also aws/amazon-vpc-cni-k8s#972

uthark · 2020-06-08T22:11:46Z

@mogren will you cherry-pick it to 1.6.x branch?

mogren · 2020-06-08T23:42:46Z

@uthark I think this change is a bit too big for a patch release, so instead planning to get #903 updated and merged, then start preparing a v1.7.0-rc1 release for testing. The config files are all updated because of #955 and #986, so a lot of changes coming.

L-IPAMD now checkpoints IPAM state to a file. Collect /var/run/aws-node/ipam.json file (if present) for debugging. See also aws/amazon-vpc-cni-k8s#972

mogren reviewed May 12, 2020

View reviewed changes

pkg/ipamd/rpc_handler.go Outdated Show resolved Hide resolved

pkg/ipamd/rpc_handler.go Outdated Show resolved Hide resolved

pkg/ipamd/datastore/data_store.go Show resolved Hide resolved

anguslees force-pushed the dek8s-ipamd branch 3 times, most recently from 37a5a6a to ac2f477 Compare May 19, 2020 08:19

anguslees marked this pull request as ready for review May 19, 2020 08:20

mogren mentioned this pull request May 20, 2020

Fall back to using http docker client #752

Closed

anguslees force-pushed the dek8s-ipamd branch from ac2f477 to a0c01c1 Compare May 22, 2020 07:38

anguslees force-pushed the dek8s-ipamd branch 2 times, most recently from 49affc4 to 378528f Compare June 4, 2020 02:32

anguslees force-pushed the dek8s-ipamd branch from 378528f to 3fcb733 Compare June 4, 2020 02:43

mogren reviewed Jun 4, 2020

View reviewed changes

anguslees force-pushed the dek8s-ipamd branch from 3fcb733 to 301fc84 Compare June 4, 2020 07:53

mogren reviewed Jun 5, 2020

View reviewed changes

mogren requested review from jaypipes and SaranBalaji90 June 5, 2020 04:01

mogren approved these changes Jun 5, 2020

View reviewed changes

anguslees force-pushed the dek8s-ipamd branch from 301fc84 to 0bb4f3f Compare June 5, 2020 05:56

anguslees added a commit to anguslees/amazon-eks-ami that referenced this pull request Jun 5, 2020

Collect /var/run/aws-node/ipam.json

2bfafbd

L-IPAMD now checkpoints IPAM state to a file. Collect /var/run/aws-node/ipam.json file (if present) for debugging. See also aws/amazon-vpc-cni-k8s#972

anguslees mentioned this pull request Jun 5, 2020

Collect /var/run/aws-node/ipam.json awslabs/amazon-eks-ami#487

Merged

mogren reviewed Jun 5, 2020

View reviewed changes

jayanthvn approved these changes Jun 5, 2020

View reviewed changes

mogren merged commit e38a4d8 into aws:master Jun 5, 2020

jayanthvn mentioned this pull request Jun 5, 2020

Sync IPAMD state before assigning pod Ips #1009

Closed

This was referenced Jun 5, 2020

Example ClusterRole is unnecessarily permissive #846

Closed

Wait until the discover controller cache is synced #738

Closed

mogren mentioned this pull request Jun 7, 2020

Duplicate IP Addresses for pods w/o host network #711

Closed

mogren pushed a commit to awslabs/amazon-eks-ami that referenced this pull request Jun 8, 2020

Collect /var/run/aws-node/ipam.json

32a58b6

L-IPAMD now checkpoints IPAM state to a file. Collect /var/run/aws-node/ipam.json file (if present) for debugging. See also aws/amazon-vpc-cni-k8s#972

mogren added this to the v1.7.0 milestone Jun 24, 2020

jayanthvn mentioned this pull request Jul 10, 2020

fix permissions for eniconfig CRD #1082

Merged

anguslees deleted the dek8s-ipamd branch July 22, 2020 08:05

jayanthvn mentioned this pull request Aug 11, 2020

Add delete timestamp to the ipamd.json file #1127

Closed

jayanthvn mentioned this pull request Aug 24, 2020

Revert veth name generation due to Calico dependency #1166

Merged

jayanthvn mentioned this pull request Sep 2, 2020

Add unassigned TS in the ipamd.json to recapture the cool down time post restart #1188

Closed

cryptosguru added a commit to cryptosguru/amazon that referenced this pull request May 9, 2022

Collect /var/run/aws-node/ipam.json

e35cc6c

L-IPAMD now checkpoints IPAM state to a file. Collect /var/run/aws-node/ipam.json file (if present) for debugging. See also aws/amazon-vpc-cni-k8s#972

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist IPAM state to local file and use across restarts #972

Persist IPAM state to local file and use across restarts #972

anguslees commented May 12, 2020 •

edited

Loading

mogren left a comment

anguslees commented May 20, 2020

anguslees commented May 20, 2020

anguslees commented May 22, 2020 •

edited

Loading

anguslees commented Jun 4, 2020

mogren left a comment

mogren Jun 4, 2020

mogren Jun 4, 2020

anguslees Jun 4, 2020

mogren Jun 4, 2020

anguslees Jun 4, 2020 •

edited

Loading

mogren left a comment

mogren Jun 5, 2020

mogren Jun 5, 2020

anguslees Jun 5, 2020 •

edited

Loading

mogren Jun 5, 2020

mogren Jun 5, 2020

anguslees Jun 5, 2020

mogren Jun 5, 2020

anguslees Jun 5, 2020

mogren Jun 5, 2020

mogren Jun 5, 2020

anguslees Jun 5, 2020

mogren Jun 5, 2020

anguslees Jun 5, 2020

mogren Jun 5, 2020

jayanthvn left a comment

uthark commented Jun 8, 2020

mogren commented Jun 8, 2020

		// AssignedIPv4Addresses is the number of IP addresses already assigned
		func (p *ENIPool) AssignedIPv4Addresses() int {

		// AssignedIPv4Addresses is the number of IP addresses already been assigned
		AssignedIPv4Addresses int

		discoverController := k8sapi.NewController(kubeClient)
		go discoverController.DiscoverLocalK8SPods()

Persist IPAM state to local file and use across restarts #972

Persist IPAM state to local file and use across restarts #972

Conversation

anguslees commented May 12, 2020 • edited Loading

mogren left a comment

Choose a reason for hiding this comment

anguslees commented May 20, 2020

anguslees commented May 20, 2020

anguslees commented May 22, 2020 • edited Loading

anguslees commented Jun 4, 2020

mogren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anguslees Jun 4, 2020 • edited Loading

Choose a reason for hiding this comment

mogren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anguslees Jun 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayanthvn left a comment

Choose a reason for hiding this comment

uthark commented Jun 8, 2020

mogren commented Jun 8, 2020

anguslees commented May 12, 2020 •

edited

Loading

anguslees commented May 22, 2020 •

edited

Loading

anguslees Jun 4, 2020 •

edited

Loading

anguslees Jun 5, 2020 •

edited

Loading