High CPU consumption in the app #1457

2opremio · 2016-05-10T06:46:26Z

On the service, with 4 nodes (probes) the app is over 100% CPU

Profile: pprof.localhost:4040.samples.cpu.001.pb.gz

The profile shows 80% CPU (maybe I was seeing garbage collection spikes?)

Perhaps the cache removed on #1447 would help?

2opremio · 2016-05-10T06:50:23Z

This is not really a degradation compared to what we had before (see #854 ) but not what I would had expected after #1418 .

2opremio · 2016-05-10T07:17:36Z

Also, the pressure of gzip is pretty high (just like on #1454 ), we could try getting a better performant compressor library or reduce the compression level until we have a solution for #1201

tomwilkie · 2016-05-10T07:51:04Z

@2opremio what version was this? Did it contain #1418?

2opremio · 2016-05-10T07:52:50Z

Master, as I mentioned both #1418 and #1447 are in

2opremio · 2016-05-10T08:33:35Z

Reverting 2dae035 makes it even worse

2opremio · 2016-05-10T10:27:33Z

I've tried using https://github.com/klauspost/compress/gzip but it doesn't seem to make a big difference. @davkal , have the requests to /api/topology increased in this release?

janwillies · 2016-05-10T10:43:39Z

We are seeing over 100% CPU usage in the app, with 5 probes connected.

Performance seems to be ok, though

This is with :latest from yesterday

2opremio · 2016-05-10T10:48:47Z

Performance seems to be ok, though

You mean that the app is usable?

davkal · 2016-05-10T10:50:38Z

have the requests to /api/topology increased in this release?

Yes they have. Whenever you click on the search field, all topologies are fetched. But only then, no recurring timer.

2opremio · 2016-05-10T10:52:51Z

@davkal I meant without search (this is master)

davkal · 2016-05-10T10:56:35Z

I meant without search (this is master)

Not from the frontend. Just checked git diff v0.14.0..origin/master -- client/app/scripts/utils/web-api-utils.js. No refresh intervals changed.

janwillies · 2016-05-10T11:24:30Z

Yes the app is usable. We are using m3.large instances
Am 10.05.2016 12:48 schrieb "Alfonso Acosta" [email protected]:

Performance seems to be ok, though

You mean that the app is usable?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1457 (comment)

2opremio · 2016-05-10T11:39:18Z

@tomwilkie Could this be a bug on #1418 ?

2opremio · 2016-05-10T12:23:05Z

I think I know what's happening. We started including container ENV variables which are huge in our case due to kubernetes services.

More than 700 env variables per container 😮 which leads to well over 100K env variables in total:

I will try to truncate them and see if it helps.

2opremio · 2016-05-10T12:23:35Z

@janwillies Can you check the number of env variables in your containers?

janwillies · 2016-05-10T12:30:27Z

170 per container, times ~100

2016-05-10 14:23 GMT+02:00 Alfonso Acosta [email protected]:

@janwillies https://github.com/janwillies Can you check the number of
env variables in your containers?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1457 (comment)

2opremio · 2016-05-10T12:40:08Z

@janwillies Cool. Then truncating them might help.

2opremio · 2016-05-12T16:31:36Z

@janwillies Reports that the app is still consuming 129% CPU with 5 nodes

willejs · 2016-05-25T16:17:38Z

I am also seeing similar issues.

idcrosby · 2016-06-16T12:22:18Z

Attaching pprof data from our cluster on AWS seeing high CPU issue:
cc @2opremio
pprof.localhost:4040.samples.cpu.001.pb.gz

2opremio · 2016-06-16T12:34:35Z

This is the profile from @idcrosby, nothing too different from the previous profiles in this issue except for the futex contention on notesleep.

Merge() is always returning a copy, so there is no need to Copy() struct fields first before calling Merge() on them. This reduces GC pressure (#1010) and helps overall app performance (#1457).

2opremio · 2016-08-16T14:10:36Z

@janwillies @willejs @idcrosby Scope 0.17 is out and should improve the App's CPU consumption by at least 50% . Please let us know if the performance is acceptable now.

alban · 2017-04-11T14:52:05Z

Also, the pressure of gzip is pretty high (just like on #1454 ), we could try getting a better performant compressor library or reduce the compression level until we have a solution for #1201

@2opremio I tried various compression level with kinvolk-archives@37cc006 and I get the following:

Compression Level	Report size in bytes	Time elapsed (*10)
0 (gzip.NoCompression)	1568604	340ms
1 (gzip.BestSpeed)	114774	234ms
2	97085	281ms
3	94018	256ms
4	91108	319ms
5	85224	366ms
6 (also using gzip.DefaultCompression == -1)	80634	411ms
7	79051	410ms
8	78691	566ms
9 (gzip.BestCompression)	77781	618ms

So if Scope were using gzip.DefaultCompression instead of gzip.BestCompression, the reports would be 4% bigger and compressed 33% faster.

I have not checked the decompression speed in the app.

2opremio · 2017-04-12T10:20:03Z

Good job. I thought we were using the default compression. Also, it may be worth checking other compression libraries

alban · 2017-04-12T15:37:52Z

The list of compression https://golang.org/pkg/compress/

bzip2 is not suitable because it has only decompression and not compression (see golang/go#4828)

File size: 1662592 (msgpack uncompressed)

test name	level	compressed size	compression time	decompression time
gzip	-1	143052	43761	8314
gzip	0	1662745	1912	2494
gzip	1	184074	15015	8360
gzip	2	170205	15406	7611
gzip	3	164116	22564	7013
gzip	4	160214	33903	7382
gzip	5	153660	35916	6992
gzip	6	143052	46799	7414
gzip	7	141689	56130	6422
gzip	8	138420	107412	7927
gzip	9	138127	170985	6594
zlib	-1	143040	45800	7895
zlib	0	1662733	3070	2974
zlib	1	184062	16524	11179
zlib	2	170193	21051	9190
zlib	3	164104	19900	8685
zlib	4	160202	29980	9135
zlib	5	153648	38100	9263
zlib	6	143040	43449	8175
zlib	7	141677	65277	7140
zlib	8	138408	109087	7451
zlib	9	138115	177843	7175
flate	-1	143034	48442	7317
flate	0	1662727	1493	1379
flate	1	184056	17335	8916
flate	2	170187	15605	7419
flate	3	164098	16493	7071
flate	4	160196	30576	8306
flate	5	153642	35537	7373
flate	6	143034	43456	8109
flate	7	141671	55900	6431
flate	8	138402	106750	6652
flate	9	138109	174859	6365
lzw	0	551151	32472	23047

I saved a msgpack file (by using kinvolk-archives@37cc006, then used gunzip on the file from /proc/$(pidof scope-probe)/root/tmp/report_0 and then renamed it to report.msgpack). I used https://gist.github.com/alban/4663d28f4cf69db4bb1d6dc365171e2c to produce this table.

Unfortunately, a lower compression level seems to imply a slower decompression. This is important since this bug is about the app.

In this table, flat with level=7 seems overall better for my msgpack file. But this is variable and when running the test another time, gzip with level=6 was the best overall.

We want the middle ground between a small compression size, a fast compression time and a fast decompression time. Tests suggest that the default compression level is better than the maximum compression level: although the reports are 4% bigger and decompress slower, they compress 33% faster. See discussion on weaveworks#1457 (comment)

2opremio added this to the 0.15.0 milestone May 10, 2016

2opremio mentioned this issue May 10, 2016

Make reports smaller #1201

Open

2opremio mentioned this issue May 10, 2016

High CPU consumption in probes #1454

Closed

2opremio mentioned this issue May 10, 2016

Search on canvas #1429

Merged

12 tasks

This was referenced May 10, 2016

Limit tables to 20 rows #1465

Merged

Add performance tests to CI #1406

Open

2opremio self-assigned this May 11, 2016

paulbellamy modified the milestones: Pre-1.0, 0.15.0 May 12, 2016

2opremio added the performance Excessive resource usage and latency; usually a bug or chore label May 13, 2016

2opremio mentioned this issue May 13, 2016

Nodes appear and disappear intermittently #897

Closed

2opremio modified the milestones: Pre-1.0, July2016 Jun 30, 2016

rade added the bug Broken end user or developer functionality; not working as the developers intended it label Jul 4, 2016

rade mentioned this issue Jul 21, 2016

Custom encoder for latest maps #1709

Merged

2opremio mentioned this issue Jul 26, 2016

Improve performance of immutable maps #1720

Merged

rade added a commit that referenced this issue Jul 28, 2016

don't Copy() self on Merge()

c4fa721

Merge() is always returning a copy, so there is no need to Copy() struct fields first before calling Merge() on them. This reduces GC pressure (#1010) and helps overall app performance (#1457).

rade mentioned this issue Jul 28, 2016

don't Copy() self on Merge() #1728

Merged

This was referenced Jul 30, 2016

Use slices instead of linked lists for Metric #1732

Merged

[WIP] Implement LatestMap as a slice of slab strings #1758

Closed

rade modified the milestones: July2016, August2016 Aug 4, 2016

rade added chore Related to fix/refinement/improvement of end user or new/existing developer functionality and removed bug Broken end user or developer functionality; not working as the developers intended it labels Aug 16, 2016

rade modified the milestones: 0.18/1.0, October2016 Sep 15, 2016

alban mentioned this issue Apr 12, 2017

gzip: change compression level to the default #2437

Merged

pecigonzalo mentioned this issue Aug 29, 2017

ECS Probe - SIGSEGV: segmentation violation #2508

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU consumption in the app #1457

High CPU consumption in the app #1457

2opremio commented May 10, 2016

2opremio commented May 10, 2016 •

edited

Loading

2opremio commented May 10, 2016

tomwilkie commented May 10, 2016

2opremio commented May 10, 2016 •

edited

Loading

2opremio commented May 10, 2016

2opremio commented May 10, 2016 •

edited

Loading

janwillies commented May 10, 2016

2opremio commented May 10, 2016

davkal commented May 10, 2016

2opremio commented May 10, 2016

davkal commented May 10, 2016

janwillies commented May 10, 2016

2opremio commented May 10, 2016

2opremio commented May 10, 2016 •

edited

Loading

2opremio commented May 10, 2016

janwillies commented May 10, 2016

2opremio commented May 10, 2016

2opremio commented May 12, 2016 •

edited

Loading

willejs commented May 25, 2016

idcrosby commented Jun 16, 2016

2opremio commented Jun 16, 2016

2opremio commented Aug 16, 2016 •

edited

Loading

alban commented Apr 11, 2017

2opremio commented Apr 12, 2017

alban commented Apr 12, 2017 •

edited

Loading

High CPU consumption in the app #1457

High CPU consumption in the app #1457

Comments

2opremio commented May 10, 2016

2opremio commented May 10, 2016 • edited Loading

2opremio commented May 10, 2016

tomwilkie commented May 10, 2016

2opremio commented May 10, 2016 • edited Loading

2opremio commented May 10, 2016

2opremio commented May 10, 2016 • edited Loading

janwillies commented May 10, 2016

2opremio commented May 10, 2016

davkal commented May 10, 2016

2opremio commented May 10, 2016

davkal commented May 10, 2016

janwillies commented May 10, 2016

2opremio commented May 10, 2016

2opremio commented May 10, 2016 • edited Loading

2opremio commented May 10, 2016

janwillies commented May 10, 2016

2opremio commented May 10, 2016

2opremio commented May 12, 2016 • edited Loading

willejs commented May 25, 2016

idcrosby commented Jun 16, 2016

2opremio commented Jun 16, 2016

2opremio commented Aug 16, 2016 • edited Loading

alban commented Apr 11, 2017

2opremio commented Apr 12, 2017

alban commented Apr 12, 2017 • edited Loading

2opremio commented May 10, 2016 •

edited

Loading

2opremio commented May 10, 2016 •

edited

Loading

2opremio commented May 10, 2016 •

edited

Loading

2opremio commented May 10, 2016 •

edited

Loading

2opremio commented May 12, 2016 •

edited

Loading

2opremio commented Aug 16, 2016 •

edited

Loading

alban commented Apr 12, 2017 •

edited

Loading