Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Design Doc #16

Merged
merged 10 commits into from
May 2, 2022
Merged

System Design Doc #16

merged 10 commits into from
May 2, 2022

Conversation

arkodg
Copy link
Contributor

@arkodg arkodg commented Apr 23, 2022

Fixes: #1
Signed-off-by: Arko Dasgupta [email protected]

Signed-off-by: Arko Dasgupta <[email protected]>
@arkodg arkodg mentioned this pull request Apr 23, 2022
Copy link
Contributor

@danehans danehans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest following Envoy's dir structure by putting docs in ./docs instead of ./design

design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
#### User Config
This configuration is based on the [Gateway API](https://gateway-api.sigs.k8s.io) and will provide:
* Infrastructure Management capabilities for the Infrastructure Administrator to provision the infrastructure required to run EnvoyProxy.
This is expressed using the [Gateway resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gateway).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Envoy proxy config is expressed using GC and GW resources. The GC optionally contains Envoy config parameters. If not parameterRefs are provided, then a matching GW instance will cause EG to provision an Envoy fleet with its default config (we should specify what Envoy's default config is).

design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
Signed-off-by: Arko Dasgupta <[email protected]>
@arkodg arkodg requested a review from danehans April 25, 2022 03:18
This was referenced Apr 25, 2022
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Show resolved Hide resolved
Signed-off-by: Arko Dasgupta <[email protected]>
Copy link
Contributor

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely in agreement with the direction of this design, added a few comments. Thanks for your work on this @arkodg!


#### Workflow
1. The Infrastructure Administrator spawns an Envoy Gateway process and provides a Bootstrap Configuration as part of the commandline argument.
2. They will configure a [GatewayClass resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass), and optionally modify default values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make use of GatewayClass parameters for customizing details of provisioned Gateways - we could do this using a ConfigMap, or by defining a custom CRD that can be referenced from a GatewayClass.

As a reference Contour has a CRD called ContourDeployment (https://github.com/projectcontour/contour/blob/main/apis/projectcontour/v1alpha1/contourdeployment.go) that can be used as GatewayClass parameters, and allows controlling things like whether Envoy is deployed as a DaemonSet or Deployment, the type of Service to provision for Envoy, etc.

Agree that some global EG settings will still be controlled in the bootstrap config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm'd the modify line to keep the common case workflow easy to understand, please lmk if you prefer the doc to be explicit about it and I'll add it back in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or by defining a custom CRD that can be referenced from a GatewayClass

+1 on defining a config API instead of using a configmap, xref.

rm'd the modify line to keep the common case workflow easy to understand, please lmk if you prefer the doc to be explicit about it and I'll add it back in

+1 on keeping the language high-level in this doc.


#### Workflow
1. The Infrastructure Administrator spawns an Envoy Gateway process and provides a Bootstrap Configuration as part of the commandline argument.
2. They will configure a [GatewayClass resource](https://gateway-api.sigs.k8s.io/concepts/api-overview/#gatewayclass), and optionally modify default values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest "one or more GatewayClass resources"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it explicit in the last section / Design Decisions, wanted to keep it simple here, highlight the common case


### Components

#### Config Sources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions about how we'll handle merging config across different sources - seems like it can be a thorny issue if not everything is in etcd. OK with deferring the details for now, but we'll have to figure it out sooner rather than later if we do want to support multiple sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all opt-in, based on the bootstrap spec, so imho the platform admin knows what they are signing up for if they include more config sources.
I added a line message service - It can also aggregate resources from multiple publishers allowing configuration from individual config sources to be aggregated before being processed by the translation layers.. But Im thinking of removing it since it adds details not needed in this generic document. We can delay this decision when we build out the config source

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #30 to capture the details of config merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also related to multi-cluster. I realize MC is P1, but how does that fit into all of this? Should we at least mention where we see MC fitting into this? Will there be a meta controller? Will each bootstrap have to manually point at multiple config sources with distributed policy? Etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we havent brainstormed on MC yet 🙈 , mainly because we have not discussed the requirements yet - is this MultiCluster Ingress use case (purely north south) with external clients sending traffic to Tier1/Edge/Front Proxies which route to Tier2/BU Proxies or east-west with internal services routing to other internal services in another cluster using the Tier2/BU in each cluster or both 😄 . Once we know the WHAT, it will require us to answer the following for the HOW

  • Architecture - is it a peer-to-peer arch or a hub-spoke architecture
  • Config Source - What is the source of truth for intent
  • Service Resolver - Which service registry does this resolver use to resolve endpoints
  • Service Registry - How is this service registry computed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we transition the MC discussion to #9 where I define

... Gateway should support multi-cluster service networking methods defined by sig-multicluster..."

@mattklein123 @arkodg I have provided additional information to #9, so PTAL.

docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
arkodg added 2 commits April 27, 2022 17:49
Signed-off-by: Arko Dasgupta <[email protected]>
* Create a detailed design and interface specification for each system component.

### Architecture
![Architecture](../images/architecture.png)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we all agree on this arch, will work with getting a better illustration out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LukeShu @skriss @youngnick please comment here so @arkodg can nail down the diagram.

Copy link
Contributor

@youngnick youngnick Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to do this @arkodg, but I looked over this a bit, and I think we should consider if we want the xds server and provisioner to be running inside the same container or be separate processes.

This diagram implies they're running inside the same container, but I think that the separate-process model has some advantages, in that it unlocks the ability to easily have one provisioner instance manage many xds control planes (since they are 1:1 with an Envoy fleet). I think it might be possible to have a single xds server process serve different config to different Envoy fleets, but that seems very risky to me - getting that wrong could very easily lead to big security problems.

I think a better representation will have two boxes, talking to the apiserver, one that is the provisioner, that has a "creates" relationship with the one (or more) xDS server boxes, which contain the architecture outlined above.

This probably isn't a blocker, for this PR merging, but I think that since we want Envoy Gateway as a whole to be able to handle multiple GatewayClasses, we'll end up with something like what I describe here anyway.,

@arkodg arkodg requested review from skriss and danehans April 28, 2022 00:59
@danehans danehans changed the title EnvoyGateway System Design System Design Doc Apr 28, 2022
Signed-off-by: Arko Dasgupta <[email protected]>
@arkodg arkodg requested a review from LukeShu April 28, 2022 20:57
* Create a detailed design and interface specification for each system component.

### Architecture
![Architecture](../images/architecture.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LukeShu @skriss @youngnick please comment here so @arkodg can nail down the diagram.

docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved

### Components

#### Config Sources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #30 to capture the details of config merging.


#### Config Manager
This component consumes the Bootstrap Config, and spawns the appropriate internal services in Envoy Gateway based on the config spec. For e.g. if the platform field
in the Bootsrap Config is set to `kubernetes`, the config manager will instantiate kubernetes controller services that implement the Config Source, Service Resolver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

I suggest removing field since it implies that the config API is already spec'd.

s/config manager/Config Manager/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding field makes the example more concrete. Can revisit/edit this statement if it doesn't reflect the actual spec

docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
arkodg added 2 commits April 28, 2022 15:05
Signed-off-by: Arko Dasgupta <[email protected]>
Signed-off-by: Arko Dasgupta <[email protected]>
@arkodg arkodg requested a review from danehans April 28, 2022 23:24
Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to be the downer here, but I think that this design should call out that the provisioner and the xDS server will need to be separate binaries to be able to achieve the goals we want around being able to reconcile multiple GatewayClasses.

However, I think that we should merge this first and then iterate rather than holding this up. This is what we all want in broad strokes, let's record that we agree, and improve afterwards.

* Create a detailed design and interface specification for each system component.

### Architecture
![Architecture](../images/architecture.png)
Copy link
Contributor

@youngnick youngnick Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to do this @arkodg, but I looked over this a bit, and I think we should consider if we want the xds server and provisioner to be running inside the same container or be separate processes.

This diagram implies they're running inside the same container, but I think that the separate-process model has some advantages, in that it unlocks the ability to easily have one provisioner instance manage many xds control planes (since they are 1:1 with an Envoy fleet). I think it might be possible to have a single xds server process serve different config to different Envoy fleets, but that seems very risky to me - getting that wrong could very easily lead to big security problems.

I think a better representation will have two boxes, talking to the apiserver, one that is the provisioner, that has a "creates" relationship with the one (or more) xDS server boxes, which contain the architecture outlined above.

This probably isn't a blocker, for this PR merging, but I think that since we want Envoy Gateway as a whole to be able to handle multiple GatewayClasses, we'll end up with something like what I describe here anyway.,

docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
@arkodg
Copy link
Contributor Author

arkodg commented Apr 29, 2022

hey @youngnick regarding #16 (comment)

  • I see the provisioner & xds-server as two distinct entities. They could run in the same Envoy Gateway process as a go routine or separately out of process, our design should allow for both. And this should be achievable with a good ConfigManager design that is responsible for instantiating these components.

  • From a go control plane API perspective, we could either create a unique Snapshot per GatewayClass/Envoy node string using SetSnapshot or just hold a separate SnapshotCache per GatewayClass/Envoy node string if we want to separate it even further. This allows us to generate a Envoy node id per GatewayClass object and use a single Go Control Plane to manage everything, reducing debugging & operational burden for the user.

Signed-off-by: Arko Dasgupta <[email protected]>
@danehans
Copy link
Contributor

@mattklein123 the design doc is close to being merged. When you have a moment, PTAL.

@youngnick
Copy link
Contributor

Let's talk more about provisioner and xds-server separation later, this LGTM until we can have that conversation.

Copy link
Contributor

@danehans danehans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with getting this merged and resolving the remaining feedback in subsequent PRs. @arkodg can you create tracker issues for any remaining feedback so it's not lost? Let's wait to merge until @mattklein123 has a review.

@danehans danehans added the documentation Improvements or additions to documentation label Apr 29, 2022
@danehans danehans added this to the Public Announcement milestone Apr 29, 2022
@mattklein123 mattklein123 self-assigned this Apr 29, 2022
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks at a high level this LGTM. A flushed out some comments that I think will help for people coming to this doc without as much context. Thank you!

docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
This is expressed using [HTTPRoute](https://gateway-api.sigs.k8s.io/concepts/api-overview/#httproute) and [TLSRoute](https://gateway-api.sigs.k8s.io/concepts/api-overview/#tlsroute).

#### Workflow
1. The Infrastructure Administrator spawns an Envoy Gateway process using a Bootstrap Configuration to manage a fleet of Envoy Proxies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above can you clarify and example of what this means in Kubernetes in terms of what it would do and not do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg maybe my example in #16 (comment) is better suited here.


### Components

#### Config Sources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also related to multi-cluster. I realize MC is P1, but how does that fit into all of this? Should we at least mention where we see MC fitting into this? Will there be a meta controller? Will each bootstrap have to manually point at multiple config sources with distributed policy? Etc.

docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
docs/design/SYSTEM_DESIGN.md Outdated Show resolved Hide resolved
Signed-off-by: Arko Dasgupta <[email protected]>
@arkodg
Copy link
Contributor Author

arkodg commented Apr 30, 2022

thanks for the review @mattklein123 ! added more examples, and linked the md sections to reduce confusion.
the only thing I didnt address is multi cluster, mainly because we havent fleshed out the requirements yet. Hoping it can be answered via #9

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to ship and iterate, thanks!

@arkodg arkodg merged commit b1fdc77 into main May 2, 2022
@danehans danehans deleted the design-doc branch May 2, 2022 16:39
This was referenced May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Design Doc
7 participants