Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve codec performance #916

Merged
merged 5 commits into from
Feb 17, 2016
Merged

Improve codec performance #916

merged 5 commits into from
Feb 17, 2016

Conversation

2opremio
Copy link
Contributor

@2opremio 2opremio commented Feb 4, 2016

Improves #854

  • Fix tests
  • Generate encoders/decoders on the fly (or at least incorporate a make target for it)
  • Fix hack from (94cee66) by updating all the vendored k8s dependencies (sigh)
  • Rebase

@2opremio 2opremio mentioned this pull request Feb 4, 2016
@2opremio 2opremio force-pushed the 854-reports-codec branch 3 times, most recently from acc8383 to b46d5d1 Compare February 9, 2016 16:30
@2opremio
Copy link
Contributor Author

2opremio commented Feb 9, 2016

As mentioned in #854 (comment), using ffjson without code generation helps a bit but doesn't cut it.

I've tried the ffjson code generator. However, the encoder generated from https://github.com/weaveworks/scope/blob/master/report/node.go breaks the UI, that is:

go get -u github.com/pquerna/ffjson
cd $GOPATH/src/github.com/weaveworks/scope/report
ffjson node.go

Which results in:

screen shot 2016-02-09 at 5 51 29 pm

I really don't want to debug the generated code, so I am going to try https://github.com/ugorji/go/ next

@2opremio 2opremio force-pushed the 854-reports-codec branch 3 times, most recently from c9f1a2b to f40cc7c Compare February 10, 2016 15:56
@2opremio
Copy link
Contributor Author

Using compile-time generated JSON encoders/decoders with github.com/ugorji/go/codec doesn't seem to solve the problem either. Here are the profiles from the scenario described in #854 (comment) #812

I run scope in the service to test (5 probes 1 app). Here's the app's profile:

pprof.localhost:4040.samples.cpu.009.pb.gz

app_codec

And here's the probe's profile:

pprof.localhost:4041.samples.cpu.003.pb.gz

probe_codec

As much as we would like to use json, I think it's time to move on to a faster codec. We can use part of this work for the communication with the UI (which the profile above is not taking into account).

@2opremio
Copy link
Contributor Author

I think I am going to try gob+websockets next.

@2opremio
Copy link
Contributor Author

The websocket+gob approach didn't help either (see #957 (comment) )

I replaced all the {Marhsal,Unmarhsal}JSON methods by Selfers and it helps a lot with the garbage collection because we don't need to create more intermediate buffers (or encoders/decoders). Note how gzip now takes 30% of the decoding time.

App:

pprof.localhost:4040.samples.cpu.010.pb.gz

app

Probe:

pprof.localhost:4041.samples.cpu.004.pb.gz

probe

The Selfers now allow to move to faster codecs like msgpack very easily. I am going to try that next.

@2opremio
Copy link
Contributor Author

After moving to msgpack and updating the ugorji/go/codec library I get this

App:

app

Probe:

probe

Although the profile looks the same for the app, note how the duration has gone down considerably. The probe doesn't seem to improve much (it doesn't degrade either) so let's treat that separately.

Note that this is still measured without connecting the UI (i.e. browser), but it brings down CPU usage of the app in the service (5 probes) to <10% compared to ~70% in master .

The next steps would be:

  • Reduce the size/publish-frequency of the reports
  • Tweak the gzip compression level.
  • Increase GOGC? (sounds like cheating, but may be worth trading some memory for GC CPU)
  • Look into building github.com/ugorji/go/codec/ with -tags=unsafe to reduce decoding allocations. I tried, but it basically broke everything. I am going to create a ticket upstream.
  • Create a custom encoder/decoder (I wouldn't be up for this, but @bboreham has a lot of experience and may want to volunteer :) ).

@paulbellamy @peterbourgon Can you please take a look at this and give some feedback? If you are OK with the results/approach, I will go ahead and clean it up for code review.

decoder = json.NewDecoder(reader).Decode
decoder = func(i interface{}) error {
return codec.NewDecoder(reader, &codec.JsonHandle{}).Decode(i)
}

This comment was marked as abuse.

This comment was marked as abuse.

@paulbellamy
Copy link
Contributor

This seems to have trouble with children's metrics.

@2opremio
Copy link
Contributor Author

This seems to have trouble with children's metrics.

Thanks, I will look into it but I will fix the tests first to see if that catches it.

Here's a final %CPU comparison against 26ccb49 running Scope in the service (5 probes -> 1 app)

What 26ccb49 This PR
Probe(s) 15-25% 20%
App (no UI) 70% 10%
App (with UI) - 30% with peaks of 50%

(I cannot simply compare against master because it's broken on the service due to #955).

For reference, here's the profile of the App with the UI attached:

app

The codec is not the culprit (i.e. unrelated to this PR).

@paulbellamy
Copy link
Contributor

Thanks, I will look into that but will fix the tests first to see if that catches it.

I suspect it is with json encoding of metrics, so it won't...

@2opremio 2opremio force-pushed the 854-reports-codec branch 11 times, most recently from 84bf578 to 9b781a7 Compare February 16, 2016 06:04
This caused a dependency chain reaction (sigh):

* All the k8s packages had to be fetched again. This in turn required:
  * Pining github.com/docker/docker/pkg/parsers to
    0f5c9d301b9b1cca66b3ea0f9dec3b5317d3686d to cirvumvent
    kubernetes/kubernetes#18774
  * Update github.com/juju/ratelimit
  * Make probe/kubernetes/client.go comply with API changes.
@2opremio
Copy link
Contributor Author

I suspect it is with json encoding of metrics, so it won't...

@paulbellamy What problems were you seeing with the metrics? I have noticed that the new codec outputs floats instead of ints. e.g:

        "metrics": {
          "docker_memory_usage": {
            "samples": [
              {
                "date": "2016-02-16T05:24:00.72055402Z",
                "value": 36864.0
              },

instead of:

        "metrics": {
          "docker_memory_usage": {
            "samples": [
              {
                "date":  "2016-02-16T05:30:54.719541452Z",
                "value": 36864
              },

@paulbellamy
Copy link
Contributor

I was just never seeing any values rendered for them on children. Didn't look further into it than that.

@foot
Copy link
Contributor

foot commented Feb 16, 2016

Sorry i clicked on the checkboxes on the first comment, not really paying attention to what I was doing!

Hope they're in the right state.

@2opremio
Copy link
Contributor Author

OK, so this is why controls are not working: ugorji/go#141 .

The new codec library misbehaves when embedding types, which is exactly how our control details are defined, sigh.

This may also be what's causing problems with the metrics.

I think I may have a workaround though: generate selfers for all the types we serialize.

@foot No worries

@paulbellamy
Copy link
Contributor

This may also be what's causing problems with the metrics.

@2opremio yep.

@2opremio
Copy link
Contributor Author

OK, I found yet another bug, ugorji/go#142 which I worked around too.

The controls are working and I think the metrics are alright.

@paulbellamy PTAL, I think I will open a bottle of champaign when this gets merged.

@2opremio 2opremio changed the title [WIP] Improve report codec performance Improve report codec performance Feb 16, 2016
@2opremio 2opremio changed the title Improve report codec performance Improve codec performance Feb 16, 2016
@2opremio 2opremio force-pushed the 854-reports-codec branch 3 times, most recently from 7c3c2f6 to 0ff1d16 Compare February 16, 2016 23:48
@paulbellamy
Copy link
Contributor

Why are we using your own fork of the codec, vs. the upstream?

@paulbellamy
Copy link
Contributor

let's ditch GOHOSTOS and GOHOSTARCH and do unset GOOS GOARCH foo instead, so go can just auto-detect. Less to learn for the user of the makefile.

@2opremio
Copy link
Contributor Author

Why are we using your own fork of the codec, vs. the upstream?

The fork is only used to build codecgen. It's a temporary solution, see the commit comment:

I included
https://github.com/2opremio/go-1/tree/master/codec/codecgen instead of
upstream until ugorji/go#139 is merged.

In fact we will be able to use upstream one once this is fixed: ugorji/go#140 (comment) , which I guess will be pretty soon judging by the maintainer's responsiveness.

EDIT: I added #973

@2opremio
Copy link
Contributor Author

let's ditch GOHOSTOS and GOHOSTARCH and do unset GOOS GOARCH foo instead, so go can just auto-detect

Agreed

@paulbellamy
Copy link
Contributor

the secondary expansion thing is terribly complex and obtuse.

@2opremio
Copy link
Contributor Author

@paulbellamy please take another look

@paulbellamy
Copy link
Contributor

Much simpler, thanks. :) LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants