Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Flux using >50% of one CPU #926

Closed
rbruggem opened this issue Feb 1, 2018 · 10 comments · Fixed by #928
Closed

Flux using >50% of one CPU #926

rbruggem opened this issue Feb 1, 2018 · 10 comments · Fixed by #928

Comments

@rbruggem
Copy link

rbruggem commented Feb 1, 2018

Noticing multiple kubectl apply every second.

This occurs with v1.2.0 and v1.2.1.

@rade rade added the bug label Feb 1, 2018
@rade
Copy link
Contributor

rade commented Feb 2, 2018

I have bumped our sock shop cluster to the latest flux (1.2.2) and all other latest weave agents. The mega yaml apply which was previously broken (due to #916) now works. Alas flux is still using about 60% of a core. According to Explore, about half of that is kubectl'ing. It is kubectl applying every ~15s - is that expected?

@squaremo
Copy link
Member

squaremo commented Feb 2, 2018

It is kubectl applying every ~15s - is that expected?

No; left alone, it should run every few minutes. It will be triggered by

  • jobs completing (automated deployments, fluxctl release ..., fluxctl automate ... and the weave cloud equivalent buttons)
  • a tick every few minutes (controlled by the argument --git-poll-interval)

So: either the tick has been tuned down radically, or there are constantly new images and the automation interval has been tuned down radically (or, there's a bug).

Is there an indication in the logs of what triggers the sync? I'd expect some messages about jobs running if it's those or automation.

@rbruggem
Copy link
Author

rbruggem commented Feb 2, 2018

Attaching logs provided by a flux user who experienced this issue
flux.log

@rade
Copy link
Contributor

rade commented Feb 2, 2018

(bear in mind we may be looking at two distinct problems here, in the two instances we are looking at)

The sock-shop fluxd has no special tuning. I can see no obvious trigger in the logs for the kubectl applys, but TBH I don't really know what I am looking for:

ts=2018-02-02T10:57:47.49427888Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2018-02-02T10:57:47.95983943Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/carts-db container=carts-db currentimage=mongo repo=mongo pattern=*
ts=2018-02-02T10:57:47.960130666Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/user container=user currentimage=weaveworksdemos/user:0.4.7 repo=weaveworksdemos/user pattern=*
ts=2018-02-02T10:57:47.960243493Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/shipping container=shipping currentimage=weaveworksdemos/shipping:master-ba362a4c repo=weaveworksdemos/shipping pattern=*
ts=2018-02-02T10:57:47.960450487Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/queue-master container=queue-master currentimage=weaveworksdemos/queue-master:master-b8bcfc79 repo=weaveworksdemos/queue-master pattern=*
ts=2018-02-02T10:57:47.96061811Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/user-db container=user-db currentimage=weaveworksdemos/user-db:0.4.7 repo=weaveworksdemos/user-db pattern=*
ts=2018-02-02T10:57:47.960773579Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/orders-db container=orders-db currentimage=mongo repo=mongo pattern=*
ts=2018-02-02T10:57:47.960956102Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/payment container=payment currentimage=weaveworksdemos/payment:master-b64b8af1 repo=weaveworksdemos/payment pattern=*
ts=2018-02-02T10:57:47.96112886Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/orders container=orders currentimage=weaveworksdemos/orders:master-f8988a3d repo=weaveworksdemos/orders pattern=*
ts=2018-02-02T10:57:51.518949283Z caller=release.go:61 component=platform method=Sync cmd=apply args="--namespace default" count=1
ts=2018-02-02T10:57:52.064705257Z caller=release.go:100 component=platform method=Sync cmd="kubectl --namespace default apply -f -" took=545.657746ms err=null output="namespace \"sock-shop\" configured"
ts=2018-02-02T10:57:52.066493091Z caller=release.go:61 component=platform method=Sync cmd=apply args= count=29
ts=2018-02-02T10:57:56.555391705Z caller=release.go:100 component=platform method=Sync cmd="kubectl apply -f -" took=4.488656238s err=null output="deployment \"carts-db\" configured\nservice \"carts-db\" configured\ndeployment \"catalogue\" configured\ndeployment \"load-test\" configured\ndeployment \"orders\" configured\ndeployment \"user-db\" configured\nservice \"orders\" configured\nservice \"session-db\" configured\ndeployment \"shipping\" configured\ndeployment \"orders-db\" configured\ndeployment \"payment\" configured\ndeployment \"rabbitmq\" configured\nservice \"user\" configured\ndeployment \"front-end\" configured\ndeployment \"session-db\" configured\nservice \"carts\" configured\nservice \"payment\" configured\nservice \"queue-master\" configured\ndeployment \"carts\" configured\nservice \"front-end\" configured\ndeployment \"queue-master\" configured\nservice \"user-db\" configured\ndeployment \"user\" configured\ndeployment \"catalogue-db\" configured\nservice \"catalogue-db\" configured\nservice \"catalogue\" configured\nservice \"orders-db\" configured\nservice \"rabbitmq\" configured\nservice \"shipping\" configured"
ts=2018-02-02T10:58:06.158436936Z caller=release.go:61 component=platform method=Sync cmd=apply args="--namespace default" count=1
ts=2018-02-02T10:58:06.839208202Z caller=release.go:100 component=platform method=Sync cmd="kubectl --namespace default apply -f -" took=680.582364ms err=null output="namespace \"sock-shop\" configured"
ts=2018-02-02T10:58:06.839294152Z caller=release.go:61 component=platform method=Sync cmd=apply args= count=29
ts=2018-02-02T10:58:13.05913582Z caller=release.go:100 component=platform method=Sync cmd="kubectl apply -f -" took=6.217109974s err=null output="service \"user-db\" configured\ndeployment \"user\" configured\nservice \"orders\" configured\nservice \"rabbitmq\" configured\nservice \"front-end\" configured\ndeployment \"payment\" configured\nservice \"orders-db\" configured\nservice \"session-db\" configured\ndeployment \"catalogue-db\" configured\ndeployment \"catalogue\" configured\ndeployment \"orders-db\" configured\ndeployment \"queue-master\" configured\ndeployment \"user-db\" configured\nservice \"user\" configured\nservice \"carts-db\" configured\nservice \"catalogue-db\" configured\nservice \"payment\" configured\ndeployment \"session-db\" configured\ndeployment \"shipping\" configured\ndeployment \"orders\" configured\nservice \"queue-master\" configured\ndeployment \"rabbitmq\" configured\nservice \"catalogue\" configured\ndeployment \"front-end\" configured\ndeployment \"load-test\" configured\nservice \"shipping\" configured\ndeployment \"carts-db\" configured\ndeployment \"carts\" configured\nservice \"carts\" configured"
ts=2018-02-02T10:58:24.762192817Z caller=release.go:61 component=platform method=Sync cmd=apply args="--namespace default" count=1
ts=2018-02-02T10:58:26.201710224Z caller=release.go:100 component=platform method=Sync cmd="kubectl --namespace default apply -f -" took=1.439309753s err=null output="namespace \"sock-shop\" configured"
ts=2018-02-02T10:58:26.202375657Z caller=release.go:61 component=platform method=Sync cmd=apply args= count=29
ts=2018-02-02T10:58:32.198947907Z caller=release.go:100 component=platform method=Sync cmd="kubectl apply -f -" took=5.99636982s err=null output="deployment \"shipping\" configured\nservice \"shipping\" configured\ndeployment \"user-db\" configured\ndeployment \"carts-db\" configured\nservice \"catalogue-db\" configured\nservice \"session-db\" configured\ndeployment \"front-end\" configured\nservice \"orders\" configured\ndeployment \"queue-master\" configured\ndeployment \"session-db\" configured\nservice \"carts\" configured\ndeployment \"catalogue\" configured\nservice \"catalogue\" configured\ndeployment \"orders-db\" configured\nservice \"carts-db\" configured\nservice \"rabbitmq\" configured\ndeployment \"user\" configured\nservice \"front-end\" configured\nservice \"orders-db\" configured\nservice \"queue-master\" configured\nservice \"user-db\" configured\ndeployment \"carts\" configured\ndeployment \"load-test\" configured\ndeployment \"payment\" configured\ndeployment \"catalogue-db\" configured\ndeployment \"orders\" configured\nservice \"payment\" configured\ndeployment \"rabbitmq\" configured\nservice \"user\" configured"

@squaremo
Copy link
Member

squaremo commented Feb 2, 2018

Right, no messages about running jobs. It does look an awful lot like it's just applying the namespace, applying everything else, then goto 10.

@squaremo
Copy link
Member

squaremo commented Feb 2, 2018

A naive attempt to reproduce this locally fails: I created a fresh minikube and pointed flux at the flux-example repo, and it behaves as expected. So there's something else going on.

@squaremo
Copy link
Member

squaremo commented Feb 2, 2018

The log attached above also shows the sync running every five minutes, when left to its own devices. Perhaps the CPU use there has a different cause.

@stefanprodan
Copy link
Member

stefanprodan commented Feb 3, 2018

I could reproduce this on GKE with the following repo https://github.com/stefanprodan/flux-demo
The repo has just two deployments, the caddy one is automated while podinfo is not.

I'm using a n1-standard-2 node and after deploying Flux the average CPU usage increased by ~50%.
I didn't noticed kubectl running more than once per minute and the CPU usage of kubectl is under 1% while fluxd uses around 50% most of time.

GCP CPU usage
screen shot 2018-02-03 at 10 25 36

Scope
screen shot 2018-02-03 at 10 18 32

Cortex
screen shot 2018-02-03 at 10 28 15

screen shot 2018-02-03 at 10 20 04

This is what Flux logs every minute:

ts=2018-02-03T08:36:21.588319201Z caller=warming.go:190 component=warmer fetching=nats-streaming total=10 expired=0 missing=10
ts=2018-02-03T08:36:21.732550794Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:21.737490165Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:21.851009807Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:21.864899317Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.019054839Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.03614893Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.047498709Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.076802372Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.090084019Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.107090559Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.107171959Z caller=warming.go:238 component=warmer updated=nats-streaming count=0
ts=2018-02-03T08:36:22.107275327Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2018-02-03T08:36:22.130578863Z caller=images.go:57 component=sync-loop service=default:deployment/caddy container=caddy currentimage=stefanprodan/caddy:0.10.10 repo=stefanprodan/caddy pattern=*

If I remove the automated deployment the CPU usage stays the same:

ts=2018-02-03T08:47:21.690835107Z caller=warming.go:190 component=warmer fetching=nats-streaming total=10 expired=0 missing=10
ts=2018-02-03T08:47:21.81497541Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.078176684Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.131129014Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.131578471Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.132031944Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.135878971Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.138002694Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.143104445Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.174781168Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.298959979Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.299053202Z caller=warming.go:238 component=warmer updated=nats-streaming count=0
ts=2018-02-03T08:47:22.299196829Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2018-02-03T08:47:22.303798986Z caller=images.go:28 component=sync-loop msg="no automated services"

@stefanprodan
Copy link
Member

pprof top:

Showing nodes accounting for 24850ms, 87.69% of 28340ms total
Dropped 167 nodes (cum <= 141.70ms)
Showing top 10 nodes out of 17
      flat  flat%   sum%        cum   cum%
    1390ms  4.90%  4.90%    28140ms 99.29%  github.com/weaveworks/flux/registry/cache.(*Warmer).Loop /Users/aleph/go/src/github.com/weaveworks/flux/registry/cache/warming.go
   10050ms 35.46% 40.37%    21050ms 74.28%  runtime.selectgo /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
    1680ms  5.93% 46.29%     4870ms 17.18%  runtime.sellock /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
    1400ms  4.94% 51.24%     4200ms 14.82%  runtime.selunlock /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
    3200ms 11.29% 62.53%     3200ms 11.29%  runtime.lock /usr/local/Cellar/go/1.9.2/libexec/src/runtime/lock_futex.go
    2800ms  9.88% 72.41%     2800ms  9.88%  runtime.unlock /usr/local/Cellar/go/1.9.2/libexec/src/runtime/lock_futex.go
    2320ms  8.19% 80.59%     2790ms  9.84%  runtime.selectrecv /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
     780ms  2.75% 83.35%      980ms  3.46%  runtime.selectdefault /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
     420ms  1.48% 84.83%      920ms  3.25%  runtime.selectnbrecv /usr/local/Cellar/go/1.9.2/libexec/src/runtime/chan.go
     810ms  2.86% 87.69%      810ms  2.86%  runtime.(*hchan).sortkey /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go (inline)

@squaremo
Copy link
Member

squaremo commented Feb 5, 2018

Good sleuthing @stefanprodan!

(For the sake of completeness, I should record that the sock shop was running kubectl constantly because it had an experimental GitHub push hook configured.)

stefanprodan added a commit that referenced this issue Feb 5, 2018
 - added the ability to stop the refresh loop
 - added priority queuing to the refresh loop
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants