Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Faster M3TSZ decoding by using 64 bit operations #2827

Merged
merged 39 commits into from
Feb 17, 2021

Conversation

linasm
Copy link
Collaborator

@linasm linasm commented Nov 2, 2020

What this PR does / why we need it:
Changes IStream to read and process the encoded data stream by using 64 bit operations (instead of using the standard io.Reader that reads the data byte-by-byte).
According to the microbenchmark based on some real world time series, this improves M3TSZ decoding performance by ~37%. Microbenchmark results:

  • before this PR -
    BenchmarkM3TSZDecode-12 10000 108797 ns/op

  • after switching to 64 bit decoding -
    BenchmarkM3TSZDecode-12 16813 71793 ns/op

  • after dropping IStream interface and using the struct directly -
    BenchmarkM3TSZDecode-12 16867 69272 ns/op

Also included macro-benchmark results on some real world data, obtained using a modified read_data_files utility:

Command: read_data_files -b 1596009600000000000 -p ~/testdata/m3db -s 4 -v 0 -B datapoints

Before the change:

Running time: 670.670802ms

7315 series read 
(10906.99 series/second)

3852592 datapoints decoded 
(5744386.05 datapoints/second)

After the change:

Running time: 460.262013ms

7315 series read 
(15893.12 series/second)

3852592 datapoints decoded 
(8370432.26 datapoints/second)

31% improvement (including the overhead of actually reading the file).

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE

Does this PR require updating code package or user-facing documentation?:
NONE

@m3db m3db deleted a comment from codecov bot Nov 29, 2020
@linasm linasm changed the title [dbnode] 64 bit iStream for faster M3TSZ decoding [dbnode] Faster M3TSZ decoding by using 64 bit operations Nov 29, 2020
@linasm linasm marked this pull request as ready for review November 29, 2020 22:24
Copy link
Collaborator

@vpranckaitis vpranckaitis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few small comments

src/dbnode/encoding/istream_test.go Show resolved Hide resolved
src/dbnode/x/xio/reader64_test.go Outdated Show resolved Hide resolved
src/dbnode/x/xio/segment_reader.go Outdated Show resolved Hide resolved
src/dbnode/x/xio/segment_reader.go Outdated Show resolved Hide resolved
src/dbnode/x/xio/segment_reader_test.go Outdated Show resolved Hide resolved
@m3db m3db deleted a comment from codecov bot Dec 5, 2020
@codecov
Copy link

codecov bot commented Dec 7, 2020

Codecov Report

Merging #2827 (75ce6c3) into master (75ce6c3) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2827   +/-   ##
=======================================
  Coverage    70.9%    70.9%           
=======================================
  Files        1079     1079           
  Lines      100527   100527           
=======================================
  Hits        71319    71319           
  Misses      24148    24148           
  Partials     5060     5060           
Flag Coverage Δ
aggregator 75.8% <0.0%> (ø)
cluster 85.2% <0.0%> (ø)
collector 84.3% <0.0%> (ø)
dbnode 75.4% <0.0%> (ø)
m3em 74.4% <0.0%> (ø)
m3ninx 73.1% <0.0%> (ø)
metrics 19.9% <0.0%> (ø)
msg 74.2% <0.0%> (ø)
query 67.2% <0.0%> (ø)
x 80.2% <0.0%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75ce6c3...c67dbab. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 7, 2020

Codecov Report

Merging #2827 (30e1c10) into master (30e1c10) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2827   +/-   ##
=======================================
  Coverage    72.3%    72.3%           
=======================================
  Files        1096     1096           
  Lines      102315   102315           
=======================================
  Hits        74020    74020           
  Misses      23195    23195           
  Partials     5100     5100           
Flag Coverage Δ
aggregator 75.8% <0.0%> (ø)
cluster 84.9% <0.0%> (ø)
collector 84.3% <0.0%> (ø)
dbnode 78.4% <0.0%> (ø)
m3em 74.4% <0.0%> (ø)
m3ninx 73.4% <0.0%> (ø)
metrics 20.0% <0.0%> (ø)
msg 74.1% <0.0%> (ø)
query 67.4% <0.0%> (ø)
x 80.6% <0.0%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30e1c10...01fdfaf. Read the comment docs.

@linasm
Copy link
Collaborator Author

linasm commented Jan 11, 2021

Screenshot 2021-01-07 at 15 48 29

@robskillington we've noticed (in the CPU profile above) that M3TSZ decoding can take a significant chunk of peer bootstrappers processing time (decoding data streamed from peers). Which makes me think we should prioritise this PR already.

}
return Bit(is.consumeBuffer(1)), nil
// NewIStream creates a new IStream
func NewIStream(reader64 xio.Reader64) *IStream {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is returning a ptr here for performance reasons? Noticed none of the fields are public.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good question. IStream used to be an interface before this change, and now it has become a struct. I think I left it returning a pointer in order to avoid accidental semantics change at the call sites (to avoid any potential pass-by-value of this struct down the stream).

numBits -= numToRead
res := readBitsInWord(is.current, numBits)
bitsNeeded := numBits - is.remaining
if err := is.readWordFromStream(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following this logic here... shouldn't we check if there are remaining bits to be reading before reading another word from the stream?

Might be worth adding some comments here around the logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is such check with an early return few lines above:

	if numBits <= is.remaining {
		// Have enough bits buffered.
		return is.consumeBuffer(numBits), nil
	}

I'll add some comments for clarity.

r.index += 8
return res, 8, nil
}
if r.index >= len(r.data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe this check can go first? nbd though haha

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the current implementation should be more performant. On most of the calls to this method we will have 8 (or more) bytes available, so we will do an early return from the first if, avoiding any other checks.

sr.lazyInit()

var (
nh = len(sr.lazyHead)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the variable names are a little hard to parse here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will rename:

		headLen     = len(sr.lazyHead)
		headTailLen = headLen + len(sr.lazyTail)

@linasm linasm requested a review from notbdu February 13, 2021 20:28
@notbdu
Copy link
Contributor

notbdu commented Feb 17, 2021

Code semantically LGTM. Taking the perf measurements at face value :).

@vpranckaitis vpranckaitis removed their assignment Feb 17, 2021
@linasm linasm merged commit ca40427 into master Feb 17, 2021
@linasm linasm deleted the linasm/m3tsz-performance branch February 17, 2021 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants