Ability to get root container stats at separate housekeeping interval #1247

derekwaynecarr · 2016-04-27T16:44:13Z

In order to support out of resource monitoring in Kubernetes, I want to be able to get information about the root container at a separate interval than the information I gather on containers associated with pods. For example, I would set housekeeping for containers associated with pods at 10s, but root container at 100ms.

A potential option is to add a flag:

-root_housekeeping_interval duration 
   if specified, perform housekeeping on the root container at the specified interval 
   rather than default housekeeping interval

/cc @pmorie @ncdc @vishh - Thoughts?

ncdc · 2016-04-27T16:45:05Z

SGTM

derekwaynecarr · 2016-04-27T16:45:28Z

From what I can gather, fsInfo is computed on demand, so no separate housekeeping interval is needed there, but @pmorie has informed me that thin_ls data per container is cached. Either way for out of resource killing, we care more about rootfs available bytes and imagefs available bytes.

pmorie · 2016-04-27T16:52:42Z

Yep, thin_ls data is cached, but my WIP hasn't established at what interval it is refreshed

vishh · 2016-04-27T19:53:16Z

My understanding is that kubelet mainly needs higher resolution for machine level stats and not for container stats. thin_ls matters only to containers and not the machine itself.
@timstclair suggested adding a query param maxAge to /api/v2.1/machineStats endpoint that will help us get more recent stats on demand.

timstclair · 2016-04-27T20:01:08Z

+1 for on demand stats. I'd also like to avoid adding more flags if we can.

derekwaynecarr · 2016-04-27T20:10:21Z

@vishh @timstclair
My understanding is that machine stats are derived by the root container stats:

https://github.com/google/cadvisor/blob/master/api/versions.go#L483

I will throw another wrinkle in here, and broaden the request.

I suspect when we get a little further into the future, we will want to get the stats for certain special containers at a higher frequency. For example, the kubelet container, the docker container, and the container that parents end-user pods are all special, and should reasonably be able to ask for higher fidelity housekeeping intervals.

The container that parents end-user pods is probably the container that we will want to drive eviction on when we move to pod-level cgroups world.

So I want to come back and re-phrase my request, I want to be able to tell cadvisor a set of special containers that have a shorter housekeeping interval. I am fine not exposing it as a flag to the binary, but I would like to be able to specify it for how Kubernetes starts its internal cadvisor.

Thoughts?

timstclair · 2016-04-27T20:18:54Z

Can't we address that with on-demand scraping as well?

WRT configuring internal-cAdvisor, I opened a proposal in #1224. It's an intermediate step, but would mean we could stop leaking cAdvisor flags into kube-binaries, and avoid the flag-editing needed for things like kubernetes/kubernetes#24771

derekwaynecarr · 2016-04-27T20:26:07Z

@timstclair - I am happy to defer to what you and @vishh think is best, you have more expertise in this area then me, just wanted to state where my confusion came from as all things looked derived from the cached root container. If the desire is to support on-demand scraping instead, that works for me because I get the same net result as the caller. Any suggestions on how you would want to see this implemented? I am volunteering my time because I think this is needed to make evictions actually useful in Kubernetes to end users without having to sacrifice large amounts of reserved memory on the node.

timstclair · 2016-04-27T20:36:13Z

I don't know that I have more experience in this area, but my main concern is that as we have more and more components which want various stats at various intervals the complexity will get out of hand, and stats will be collected unnecessarily often. If we can make it a happen, I think a good on demand model could clean this up and lead to greater efficiency.

I think this is probably complex enough to warrant at least an informal design document. I'd be happy to help out with it, but here are a few issues I can think of off the top of my head:

Blocking callers on a potentially slow operation: we made need to provide an async interface, or at least a timeout
Concurrent stats requests: we should return the stats to both callers in this case, but that could be a problem if the requests are slightly different
Cacheing the data appropriately
Handling slow operations (stating large directories): we should continue to rely on asynchronous scrapers for this

derekwaynecarr mentioned this issue Aug 8, 2016

Eviction manager monitoring interval too small kubernetes/kubernetes#30173

Closed

vishh mentioned this issue Aug 24, 2016

Kubelet evictions - whats remaining? kubernetes/kubernetes#31362

Closed

dashpole added the kind/enhancement label Mar 16, 2017

dashpole mentioned this issue Oct 23, 2017

On-Demand container metrics #1779

Merged

egernst mentioned this issue Aug 6, 2019

eviction: should pod overhead be considered kubernetes/kubernetes#80993

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to get root container stats at separate housekeeping interval #1247

Ability to get root container stats at separate housekeeping interval #1247

derekwaynecarr commented Apr 27, 2016

ncdc commented Apr 27, 2016

derekwaynecarr commented Apr 27, 2016

pmorie commented Apr 27, 2016

vishh commented Apr 27, 2016

timstclair commented Apr 27, 2016

derekwaynecarr commented Apr 27, 2016

timstclair commented Apr 27, 2016

derekwaynecarr commented Apr 27, 2016

timstclair commented Apr 27, 2016

Ability to get root container stats at separate housekeeping interval #1247

Ability to get root container stats at separate housekeeping interval #1247

Comments

derekwaynecarr commented Apr 27, 2016

ncdc commented Apr 27, 2016

derekwaynecarr commented Apr 27, 2016

pmorie commented Apr 27, 2016

vishh commented Apr 27, 2016

timstclair commented Apr 27, 2016

derekwaynecarr commented Apr 27, 2016

timstclair commented Apr 27, 2016

derekwaynecarr commented Apr 27, 2016

timstclair commented Apr 27, 2016