Configurable k8s clusters for testing and deploying networks #12

just-mitch · 2024-08-05T23:49:33Z

We will:

add a helm chart for deploying a configurable network.
add support for ad-hoc k8s clusters in CI and running e2e tests on them
add support for deploying a public network

alexghr · 2024-08-14T13:27:47Z

in-progress/7588-spartan-clusters.md

+
+**boot node**
+
+There will be a statefulset with a single replica for the boot node. As part of its init container it will deploy the enshrined L1 contracts. Other nodes in the network will be able to resolve the boot node's address via its stable DNS name, e.g. `boot-node-0.aztec-network.svc.cluster.local`.


IIUC this would be the P2P boot node? I'd personally move bootstrapping the L1 contracts a k8s job that runs independently/before the rest of the network.

Later edit: not sure how we'd distribute the L1 contract address to the other services? The addresses will only be known after deploying the contracts so I think we'd need some higher level tool to orchestrate extracting the addresses and creating configmaps/secrets.

Also we need to deploy protocol contracts to L2.

I thought about taking the job route, but since nodes can just ask for addresses this seems simpler for now.

Good point on deploying L2 protocol contracts. I'll include that. 👍

I think @just-mitch intends for the boot node to be a regular full node, not the P2P boot node.

But yeah, we will need to run a bootstrapping process similar to what we are currently doing on our other networks.

alexghr · 2024-08-14T13:36:13Z

in-progress/7588-spartan-clusters.md

+
+There will be a statefulset for the full nodes. The number of replicas will be configurable. Each full node will have a service exposing its p2p port and node port.
+
+As part of their init container, they will get config from the boot node, including its ENR (which will require exposing this on the `get-node-info` endpoint). 


Ah so nodes would just ask the bootnode for the L1 addresses 👍

alexghr · 2024-08-14T13:37:07Z

in-progress/7588-spartan-clusters.md

+  RUN kind load docker-image aztecprotocol/end-to-end:$AZTEC_DOCKER_TAG
+  RUN helm install aztec-network helm-charts/aztec-network --set $network_values --namespace $namespace
+  RUN helm install aztec-chaos helm-charts/aztec-chaos --set $chaos_values --namespace $namespace
+  RUN helm test aztec-network --namespace $namespace


So these e2e tests would run on the existing test runner but they leverage docker to create a k8s cluster?

Yes that's exactly right

alexghr · 2024-08-14T13:39:20Z

in-progress/7588-spartan-clusters.md

+
+Existing e2e tests will continue to work as they do now. 
+
+We will gradually port the tests to work with either the existing setup or the new setup, configurable via an environment variable; this likely can be done by simply pointing the wallets that run in the tests to PXEs in the k8s cluster.


🙌 This is great!

spalladino

Plan looks good! I don't have enough k8s experience to commit to it and I'm biased towards the current setup, but if you get buy-in for switching to it, then let's go for it.

spalladino · 2024-08-14T14:25:50Z

in-progress/7588-spartan-clusters.md

+
+There will be a statefulset with a single replica for the boot node. As part of its init container it will deploy the enshrined L1 contracts. Other nodes in the network will be able to resolve the boot node's address via its stable DNS name, e.g. `boot-node-0.aztec-network.svc.cluster.local`.
+
+**full node**


Should we differentiate full nodes, validator nodes, sequencing nodes, and prover nodes?

Also, should we spin up prover agents separately from prover nodes?

Yep! These should be separated. I will add that.

Side note, I think validator/sequencing nodes should be the same for this purpose.

I think validator/sequencing nodes should be the same for this purpose.

True, that's a protocol-level decision, not an infra one

spalladino · 2024-08-14T14:27:56Z

in-progress/7588-spartan-clusters.md

+
+### Production Network
+
+There will be a separate long-lived k8s cluster deployed on AWS. This cluster will be used for running the public `spartan` network.


Is Spartan SequencerNet?

I've never heard of SequencerNet: spartan will be a public, permissioned network that will support multiple validators (and provers).

So I think they're the same thing.

spalladino · 2024-08-14T14:28:42Z

in-progress/7588-spartan-clusters.md

+There will be a deployment for graphana. It will have a single replica, and be exposed via ClusterIP.
+
+
+### Staging Network


If we go down this route, I'd use this to deploy all of our networks (ie alphanet, devnet, and provernet), not just sequencer nets.

For sure. I'd like to demonstrate it working in isolation with this one network and then migrate the others once we have confidence.

spalladino · 2024-08-14T14:29:01Z

in-progress/7588-spartan-clusters.md

+
+We use docker-compose already in some tests. The problem is that it is very difficult to test a network with more than a few nodes. It is also difficult to simulate network conditions. Last, we wouldn't be able to use the same tooling to deploy a public network.
+
+The thinking here is to use the same tooling that we use for production deployments to test our networks. This should result in less code and more confidence.


To confirm: this would replace much of our tf templates then?

Yes. TF can/should be used to set up the k8s cluster itself, and public networking infrastructure, but our software deployment configuration can live in helm.

PhilWindle · 2024-08-14T16:34:34Z

in-progress/7588-spartan-clusters.md

+
+There will be a github action workflow that deploys the network to this cluster on every push to the `staging` branch.
+
+The grafana dashboard will be exposed via a public IP, but password protected.


Can we just use our existing prom/grafana setup?

Yes. Local clusters used in CI/CD can deploy their own grafana/prom within the k8s cluster, and we can disable that for staging and update the existing/external prometheus to scrape from the opentel collector within the staging k8s cluster.

PhilWindle · 2024-08-14T16:52:07Z

in-progress/7588-spartan-clusters.md

+
+## Documentation Plan
+
+We will write documentation on how people can join the `spartan` network. 


I'm not familiar with k8s but all of this seems much easier than anything we have at the moment.

just-mitch · 2024-08-16T17:11:34Z

Got a verbal +1 from @charlielye .

just-mitch added 2 commits August 5, 2024 19:48

initial commit

76f5b82

add commentary on L1 censorship and attacknet

c5d3cbd

just-mitch marked this pull request as ready for review August 6, 2024 12:48

just-mitch requested review from alexghr, charlielye and PhilWindle August 6, 2024 12:48

alexghr approved these changes Aug 14, 2024

View reviewed changes

spalladino reviewed Aug 14, 2024

View reviewed changes

PhilWindle reviewed Aug 14, 2024

View reviewed changes

address pr feedback

7a0420c

just-mitch mentioned this pull request Aug 14, 2024

Epic: Configurable K8s Clusters AztecProtocol/aztec-packages#7998

Open

just-mitch merged commit 01c680c into main Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable k8s clusters for testing and deploying networks #12

Configurable k8s clusters for testing and deploying networks #12

just-mitch commented Aug 5, 2024

alexghr Aug 14, 2024

alexghr Aug 14, 2024

alexghr Aug 14, 2024

just-mitch Aug 14, 2024

PhilWindle Aug 14, 2024

alexghr Aug 14, 2024

just-mitch Aug 14, 2024

alexghr Aug 14, 2024

just-mitch Aug 14, 2024

alexghr Aug 14, 2024

spalladino left a comment •

edited

Loading

spalladino Aug 14, 2024

spalladino Aug 14, 2024

just-mitch Aug 14, 2024

spalladino Aug 14, 2024

spalladino Aug 14, 2024

just-mitch Aug 14, 2024

spalladino Aug 14, 2024

just-mitch Aug 14, 2024

spalladino Aug 14, 2024

just-mitch Aug 14, 2024

PhilWindle Aug 14, 2024

just-mitch Aug 14, 2024

PhilWindle Aug 14, 2024

just-mitch commented Aug 16, 2024


		boot node

		There will be a statefulset with a single replica for the boot node. As part of its init container it will deploy the enshrined L1 contracts. Other nodes in the network will be able to resolve the boot node's address via its stable DNS name, e.g. `boot-node-0.aztec-network.svc.cluster.local`.


		There will be a statefulset for the full nodes. The number of replicas will be configurable. Each full node will have a service exposing its p2p port and node port.

		As part of their init container, they will get config from the boot node, including its ENR (which will require exposing this on the `get-node-info` endpoint).


		Existing e2e tests will continue to work as they do now.

		We will gradually port the tests to work with either the existing setup or the new setup, configurable via an environment variable; this likely can be done by simply pointing the wallets that run in the tests to PXEs in the k8s cluster.


		There will be a statefulset with a single replica for the boot node. As part of its init container it will deploy the enshrined L1 contracts. Other nodes in the network will be able to resolve the boot node's address via its stable DNS name, e.g. `boot-node-0.aztec-network.svc.cluster.local`.

		full node


		### Production Network

		There will be a separate long-lived k8s cluster deployed on AWS. This cluster will be used for running the public `spartan` network.

		There will be a deployment for graphana. It will have a single replica, and be exposed via ClusterIP.


		### Staging Network


		We use docker-compose already in some tests. The problem is that it is very difficult to test a network with more than a few nodes. It is also difficult to simulate network conditions. Last, we wouldn't be able to use the same tooling to deploy a public network.

		The thinking here is to use the same tooling that we use for production deployments to test our networks. This should result in less code and more confidence.


		There will be a github action workflow that deploys the network to this cluster on every push to the `staging` branch.

		The grafana dashboard will be exposed via a public IP, but password protected.


		## Documentation Plan

		We will write documentation on how people can join the `spartan` network.

Configurable k8s clusters for testing and deploying networks #12

Configurable k8s clusters for testing and deploying networks #12

Conversation

just-mitch commented Aug 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spalladino left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

just-mitch commented Aug 16, 2024

spalladino left a comment •

edited

Loading