Support for label selection in watt/kubewatch #1292

esmet · 2019-03-06T21:57:33Z

Please describe your use case / problem.

I run multiple deployments of ambassador on a multi-tenant Kubernetes cluster, using ambassador_id to separate them into non-overlapping "environments". There can be hundreds of different environments running at the same time, and each environment can define dozens of Service objects.

In this scenario, Ambassador (kubewatch) uses a fair amount of memory (2-4gb) and takes up substantial CPU to process watcher updates. I've seen Ambassador take 60-70 seconds to process a single 15mb yaml snapshot. Worse, when any single service object changes, every ambassador will perform another that 60-70 second update. Ideally, Ambassador would use memory and CPU proportional to the set of services that are relevant to that particular ambassador_id, and could then scale well even in a massively multi-tenant cluster.

Describe the solution you'd like

I propose adding an environment configuration KUBEWATCH_LABEL_SELECTOR to kubewatch (https://github.com/datawire/teleproxy) which will be a raw label selection string to provide to the List/Watch implementation.

For example, if my architecture guarantees that all service objects in an environment contain a consistent environment label, then I could pass KUBEWATCH_LABEL_SELECTOR="environment=qa123" to limit the number of objects that kubewatch must operate on (ie: only the ones in the qa123 environment). This will limit the amount of memory and CPU required by Ambassador overall.

I have a patch that implements this behavior for kubewatch (https://github.com/datawire/teleproxy)

I chose to open the issue here, at least for starters, since this feels mostly about an Ambassador scalability use case.

Describe alternatives you've considered

I considered investigating the hot path for yaml parsing in diagd to see if we could make it faster. I think this problem is solved best by letting an Ambassador operator tell the system which objects it should look at instead of making the "everything" case faster. Even better, this approach would allow an operator to add new guard rails to prevent user mistakes (eg: have "ambasasdor-staging" only consider services labeled "staging", for even better isolation from "production")

Additional context

I observed a few crash stacks in diagd when it was under performance pressure.

Unfortunately I seem to have misplaced my notes on this, but I remember it was within load_from_filesystem on

110         for filepath, filename in inputs:
111             self.logger.info("reading %s (%s)" % (filename, filepath))
112
113             try:
114                 serialization = open(filepath, "r").read()
115                 self.parse_yaml(serialization, k8s=k8s, filename=filename)
116             except IOError as e:
117                 self.aconf.post_error("could not read YAML from %s: %s" % (filepath, e))

where open() returned None, and the subsequent read() failed.

The text was updated successfully, but these errors were encountered:

esmet · 2019-03-07T02:42:10Z

I decided to investigate optimizing the yaml parsing path anyway, and it turns out that we can get a big speedup by using the C loader over the standard python implementation.

The original issue I ran into was that ~20k service objects serialized as a yaml snapshot would take around 70 seconds to parse back into diagd's memory. With the C loader, this time is down to 6.5 seconds.

Combining these two optimizations, my multi-tenant workload now allows for a single ambassador instance to process relevant updates in around 100ms (cutting the 20k services down to around 200-300, and getting a 10x speedup using the C loader). I'm happy with these results and I think each optimization has value own its own. I'll open a separate issue for using the C loader - #1294

draeron · 2019-03-14T17:19:10Z

i've been searching the issues and it seems my problem are related. In our case, i would postulate that it's the secrets count which is problematic since all our helm/tiller history is stored in secrets.

#1297

esmet · 2019-03-17T22:11:58Z

Bump: thoughts? This optimization is critical for my use case, and I think others may eventually run into this, too.

kflynn · 2019-03-21T13:59:22Z

@esmet, so sorry for the delay here! I did in fact switch us to the C YAML parser, and I'd be very interested in seeing your patch to Kubewatch. Want to open a PR in the Teleproxy repo?

Also, are you on the Datawire OSS Slack? There's an #ambassador-dev channel there which is a great place for discussions like this.

esmet mentioned this issue Mar 7, 2019

Considering using the C yaml loader in Python #1294

Closed

esmet mentioned this issue Mar 18, 2019

Adding new routes or changing existing is very slow #1318

Closed

This was referenced Mar 21, 2019

Mappings based on labels? #157

Closed

label based scoping functionality for ambassador #259

Closed

This was referenced Mar 22, 2019

Support KUBEWATCH_LABEL_SELECTOR to only process objects with a given selector. datawire/teleproxy#84

Closed

Make Ambassador "Kubernetes-Native" with Custom Resource Definitions (CRD) #482

Closed

esmet mentioned this issue May 4, 2019

Add support for label selectors on bootstrap datawire/teleproxy#107

Merged

esmet changed the title ~~Support for label selection in kubewatch~~ Support for label selection in watt/kubewatch May 4, 2019

kflynn added this to the sagrada-familia milestone May 9, 2019

kflynn removed this from the sagrada-familia milestone Jun 10, 2019

esmet mentioned this issue Jun 27, 2019

Support selectors to filter which objects are processed by an Ambassador deployment. #1656

Merged

2 tasks

kflynn added the contributor-working :) label Jul 8, 2019

kflynn added this to the santa-cruz milestone Jul 8, 2019

kflynn closed this as completed Jul 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for label selection in watt/kubewatch #1292

Support for label selection in watt/kubewatch #1292

esmet commented Mar 6, 2019 •

edited

Loading

esmet commented Mar 7, 2019

draeron commented Mar 14, 2019

esmet commented Mar 17, 2019

kflynn commented Mar 21, 2019

Support for label selection in watt/kubewatch #1292

Support for label selection in watt/kubewatch #1292

Comments

esmet commented Mar 6, 2019 • edited Loading

esmet commented Mar 7, 2019

draeron commented Mar 14, 2019

esmet commented Mar 17, 2019

kflynn commented Mar 21, 2019

esmet commented Mar 6, 2019 •

edited

Loading