Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A caching, log(n) complexity report merger #1418

Merged
merged 1 commit into from
May 5, 2016
Merged

Conversation

tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented May 3, 2016

Contains #1398

With this, the vast majority (7 of 8s) of CPU is spent decoding reports.

@tomwilkie tomwilkie changed the title A caching, log(n) complexity report merger [WIP] A caching, log(n) complexity report merger May 3, 2016
func (ns byID) Swap(i, j int) { ns[i], ns[j] = ns[j], ns[i] }
func (ns byID) Less(i, j int) bool { return ns[i].id < ns[j].id }

func hash(_1, _2 string) string {

This comment was marked as abuse.

@tomwilkie tomwilkie force-pushed the log-n-report-merger branch from 6d92ce2 to 16928f1 Compare May 3, 2016 15:29
for _, tr := range c.reports {
rpt = rpt.Merge(tr.report)
id.Write([]byte(tr.report.ID))
rpts = append(rpts, tr.report)
}

This comment was marked as abuse.

This comment was marked as abuse.

@paulbellamy
Copy link
Contributor

You don't really need the cache if you just hold a pointer to the root of the tree (i.e. the "current report")

@tomwilkie
Copy link
Contributor Author

You don't really need the cache if you just hold a pointer to the root of the tree (i.e. the "current report")

How so? The algorithm would look very different (have to build and diff two trees). Also I don't know how that would work in the multitenant case.

I actually remove the left & right pointers from the node in the latest rev...

@tomwilkie tomwilkie force-pushed the log-n-report-merger branch from 16928f1 to 6c7ca90 Compare May 4, 2016 09:30
@tomwilkie
Copy link
Contributor Author

@paulbellamy I'm more concerned about the algorithm TBH. I sort the reports by ID and then take each pair iteratively. As the IDs are random, this means removing a single report will mean we have to re-merge 1/2 the reports again (as they will all 'shift up' by one).

I think a better approach might be to subdivide the address space in two each time, more top-down than bottom up. WDYT?

@tomwilkie tomwilkie force-pushed the log-n-report-merger branch from 6c7ca90 to 215991a Compare May 4, 2016 10:14
@tomwilkie tomwilkie changed the title [WIP] A caching, log(n) complexity report merger A caching, log(n) complexity report merger May 4, 2016
@tomwilkie
Copy link
Contributor Author

@paulbellamy WDYT?

@paulbellamy
Copy link
Contributor

paulbellamy commented May 4, 2016

I'd be interested in seeing a benchmark of a large number of report merges before/after this change.

Edit both with/without the caching

@paulbellamy
Copy link
Contributor

I'm also curious about the choice of branching factor. i.e. why a binary tree, not e.g. 32 leaf nodes?

@tomwilkie
Copy link
Contributor Author

I'm also curious about the choice of branching factor. i.e. why a binary tree, not e.g. 32 leaf nodes?

No particular reason; the branching factor is just falls out as a constant factor in the big-o anyway.

@tomwilkie
Copy link
Contributor Author

I'm also curious about the choice of branching factor. i.e. why a binary tree, not e.g. 32 leaf nodes?

Also merge is binary.

@tomwilkie tomwilkie force-pushed the log-n-report-merger branch from 0bacaaa to d1528c7 Compare May 4, 2016 15:04
@tomwilkie
Copy link
Contributor Author

I'd be interested in seeing a benchmark of a large number of report merges before/after this change.

Toms-MacBook-Pro:app twilkie$ go test -bench Benchmark
...                 
PASS
BenchmarkSmartMerger-8                        20      58676996 ns/op    22620731 B/op     137828 allocs/op
BenchmarkSmartMergerWithoutCaching-8          20      66029486 ns/op    33815075 B/op     214389 allocs/op
BenchmarkDumbMerger-8                          3     455741265 ns/op    198807458 B/op   1685880 allocs/op
ok      github.com/weaveworks/scope/app 7.617s

@tomwilkie tomwilkie force-pushed the log-n-report-merger branch from d1528c7 to 2f54a43 Compare May 4, 2016 15:36
// Define how to merge two nodes together. The result of merging
// two reports is cached.
hits := 0
misses := 0

This comment was marked as abuse.

@tomwilkie tomwilkie force-pushed the log-n-report-merger branch from 2f54a43 to 54a760a Compare May 4, 2016 16:53
@tomwilkie
Copy link
Contributor Author

@paulbellamy PTAL

@paulbellamy paulbellamy merged commit c671c58 into master May 5, 2016
@paulbellamy paulbellamy deleted the log-n-report-merger branch May 5, 2016 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants