Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loading (on demand) for Routes, Clusters and Endpoints #2500

Closed
nt opened this issue Feb 1, 2018 · 22 comments · Fixed by #20065
Closed

Lazy loading (on demand) for Routes, Clusters and Endpoints #2500

nt opened this issue Feb 1, 2018 · 22 comments · Fixed by #20065
Assignees
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!

Comments

@nt
Copy link
Member

nt commented Feb 1, 2018

For requests that do not have routes, I would need to write a filter that can send xDS queries to get a Route Added with the associated configuration (cluster def and endpoints).

The use case is for deployments with a very high cluster cardinality and failing the request is not an option.

I do not know if this should be an envoy feature but it does not seem like it can be implemented as a custom filter as there are not xDS headers in include/envoy.

Any pointers on how to achieve this would be greatly appreciated.

cc @mattklein123 @htuch

@ramaraochavali
Copy link
Contributor

@nt I have opened similar issue in the past. See if this #2022 helps. Looks very close though.

@mattklein123 mattklein123 added the enhancement Feature requests. Not bugs or questions. label Feb 1, 2018
@mattklein123
Copy link
Member

mattklein123 commented Feb 1, 2018

Here are my rough implementation notes for how I would go about doing this:

Initially I would probably implement a new gRPC API for this purpose. I think you could use the existing CDS API roughly, but operate it yourself by running your own gRPC fetcher on the main thread (created during filter initialization -- there are singleton examples of this already including RDS itself). The API you want might actually be unary vs. streaming also. Roughly you want to fetch a cluster given a name, without any streaming. I think ultimately we can figure out how to fold this back into the main CDS API definition as well as internal Envoy implementation, but my recommendation is to initially implement out of band while we explore the design space. Implementing an API like this in Envoy is pretty trivial with the interfaces we have.

The request flow looks something like:

  1. When in request context, call ClusterManager::get() to get the thread local cluster that you are looking for. If it already exists, proceed per normal.
  2. If the cluster does not exist, post a message to the main thread asking to acquire the cluster, and block the request.
  3. The main thread will make a unary call to your management server to fetch the cluster. This will return a envoy::api::v2::Cluster which can be fed directly into ClusterManager::addOrUpdatePrimaryCluster().
  4. Add thread local callbacks that can be subscribed to on the worker threads for when clusters are added or removed (as an aside we need this anyway to make Redis work with CDS). The request can unblock itself when the cluster is created or some timeout occurs.
  5. I would recommend also adding some type of timeout/TTL logic on the main thread which cleans up clusters that have not been used in a while. They can be removed with ClusterManager::removePrimaryCluster()

A few implementation notes:

  1. Initially, to get things running, I would probably try to return static clusters using CDS and not use EDS. Ultimately though you should use EDS such that once a node is using a cluster, it can subscribe to streaming EDS updates until the cluster TTL expires causing its removal. Note that until CDS: Support cluster update warming #1930 is implemented, your API+EDS in this case won't work seamlessly since the cluster will get created before there are any hosts in it.
  2. If you go with the TTL approach, you will need to let the main thread know when there are requests. This can be done using the existing cluster request stats which could be snapped occasionally by the main thread to see the last time they changed.
  3. We could go straight to exposing the CDS API out of cluster manager, and allowing a filter to modify the set of watches (CDS today is an empty request, expecting all clusters in the response). This is the other way to implement this. It's not too terrible to expose this out of cluster manager, as cluster manager owns CDS, but like I was saying before, for simplicity I might just explore a prototype with a new API and then once we figure it out we can fold it back in.

@htuch is very familiar with all of this and should be able to help while I'm out on leave. LMK if what I wrote above makes sense or if you want any additional specific pointers and I can provide them.

@nt
Copy link
Member Author

nt commented Feb 1, 2018

Thank you for the notes, I'll get started on this. I do not plan to implement TTL though as part of this change.

@htuch
Copy link
Member

htuch commented Feb 1, 2018

I'd add that code structure wise, I think it makes sense to have some kind of LazyClusterLoader object owned by the main thread that is responsible for doing the post (to itself) and maintaining any state associated with the in-flight pseudo CDS. I.e. there should be a separation of concerns between the filter and the code that is responsible for doing the fetch (which might be serving lazy loads from multiple worker threads and requests).

@htuch
Copy link
Member

htuch commented Feb 1, 2018

One middle ground from modifying xDS would be to establish an independent CDS channel, using the same streaming protocol, from the LazyClusterLoader, using code in https://github.com/envoyproxy/envoy/blob/master/source/common/config/grpc_mux_subscription_impl.h and https://github.com/envoyproxy/envoy/blob/master/source/common/config/grpc_subscription_impl.h.

That way, you can keep the CDS protocol, but not use the existing CDS implementation which will be not work well as structured today. You can post independent updates from LazyConfigLoader as Matt describes. The key difference will be that in onConfigUpdate(), you can process just singleton resources for cluster addition, rather than update the entire state of the world (which would cause cluster draining etc.).

@mattklein123
Copy link
Member

+1 to all of @htuch's comments above.

@nt
Copy link
Member Author

nt commented Mar 6, 2018

I have an initial PR out: #2740

@mattklein123
Copy link
Member

Initial partial implementation of cross thread cluster creation here: #3479 cc @ramaraochavali

@stale
Copy link

stale bot commented Jul 7, 2018

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 7, 2018
@htuch htuch added the help wanted Needs help! label Jul 9, 2018
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jul 9, 2018
@fredlas
Copy link
Contributor

fredlas commented May 17, 2019

This is nowadays typically being referred to as "on-demand". Hopefully writing on-demand or on demand in here will make this issue show up when searching issues for "on-demand". On-demand on-demand.

@htuch htuch changed the title Lazy loading for Routes, Clusters and Endpoints Lazy loading (on demand) for Routes, Clusters and Endpoints May 17, 2019
@htuch
Copy link
Member

htuch commented May 17, 2019

@fredlas I've also updated the title, thanks.

@banks
Copy link
Contributor

banks commented Apr 15, 2020

Does anyone have any info on the status of this?

From the linked issue above it seems like there is already some support for a subset of RDS loading on demand. Is anyone working on or planning to work on extending that to work for clusters and/or endpoints?

I got excited reading the Delta xDS docs recently:

  • Allows the Envoy to on-demand / lazily request additional resources. For example, requesting a cluster only when a request for that cluster arrives.

I realise that's only talking about what the protocol allows not what Envoy actually does but the way it's worded doesn't make it very clear that this is aspirational still.

Since I couldn't find a definitive answer on whether Envoy does actually support that already (other than this issue still being open I guess) I experimented and indeed I can't get either CDS or EDS to only subscribe when a request comes in for the resource when using delta xDS.

Any info much appreciated!

@htuch
Copy link
Member

htuch commented Apr 16, 2020

@banks we only have VHDS (for VirtualHosts in route configs) today. It would be fairly reasonable to add on-demand CDS (and it's done implicitly by at least one filter but not generically). This is because we already have all the hooks in place to support this. This would then imply on-demand EDS loading. This issue remains in help wanted state.

@mattklein123 mattklein123 self-assigned this Dec 12, 2020
@g0194776
Copy link

Any updates for on-demand EDS loading now?

@htuch
Copy link
Member

htuch commented Jan 20, 2021

I don't think there has been any public work on on-demand EDS.

@stevenzzzz
Copy link
Contributor

cc lambda@

@stevenzzzz
Copy link
Contributor

cc @lambdai

@krnowak
Copy link
Contributor

krnowak commented Mar 10, 2021

Hi, I'd like to take the part of lazy (on-demand) loading of clusters. Will write a document and share here and on slack.

@mattklein123
Copy link
Member

@krnowak SGTM. At a high level we need a filter that can hold requests (for HTTP) or connections (for network) and perform a cluster lookup and wait on various callbacks until it's complete, times out, or fails. All of the plumbing exists it just needs to be put together in OSS. Would love to see this happen.

@krnowak
Copy link
Contributor

krnowak commented Mar 10, 2021

@mattklein123: My idea was to extend envoy.filters.http.on_demand to perform the lookup and waiting. And extend RouteAction with a "odcds" field containing "SourceConfig". But I think I'll flesh it all out in the design document. Will share it here ASAP.

@krnowak
Copy link
Contributor

krnowak commented Mar 12, 2021

The document: https://docs.google.com/document/d/1AqhQnrX_7SS6PAkNwoaA2peCtk9htgwSBTRg1gzFPNw/edit?usp=sharing

@mattklein123
Copy link
Member

This is being actively worked on. @krnowak can you make sure this issue gets closed when the on demand work is fully merged and documented? Thank you!

htuch pushed a commit that referenced this issue May 3, 2022
This adds an odcds_config field to the extension's config, and also allows the extension to be configured per-route. As it stands, it currently works only with routes using cluster-header config.

Risk Level: Medium, extending one extension in an opt-in way.
Testing: Added unit tests and integration tests.
Fixes #2500

Signed-off-by: Krzesimir Nowak <[email protected]>
ravenblackx pushed a commit to ravenblackx/envoy that referenced this issue Jun 8, 2022
)

This adds an odcds_config field to the extension's config, and also allows the extension to be configured per-route. As it stands, it currently works only with routes using cluster-header config.

Risk Level: Medium, extending one extension in an opt-in way.
Testing: Added unit tests and integration tests.
Fixes envoyproxy#2500

Signed-off-by: Krzesimir Nowak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants