-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ReadAt() (pread syscall) instead of Read() in seek.go
#1664
Conversation
@@ -542,6 +526,7 @@ type ReusableSeekerResources struct { | |||
xmsgpackDecoder *xmsgpack.Decoder | |||
fileDecoderStream *bufio.Reader | |||
byteDecoderStream xmsgpack.ByteDecoderStream | |||
offsetFileReader *offsetFileReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two questions:
(1 Does this need to be in this struct? allocating the 16 byte struct per read request isn't going to cost us anything (esp compared to the syscalls you've gotten rid of).
(2) What's left to make a regular Seeker concurrency safe? Would be nice to get rid of the Concurrent...Seeker
if we could
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: putting it in the reusable seeker resources. Yeah it doesn't cost much to allocate but we already have this pattern setup so its pretty easy to hook into it. Would rather avoid the alloc if we can.
resources.fileDecoderStream.Reset(s.indexFd) | ||
resources.offsetFileReader.reset(s.indexFd) | ||
resources.offsetFileReader.setOffset(offset) | ||
resources.fileDecoderStream.Reset(resources.offsetFileReader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh here's where I mean it'd be easier to avoid reusing the offsetFileReader and just alloc a new one. don't think it's going to cause any performance issues either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are users I see doing tens of thousands of reads per second, might be worth continuing to keep per operation allocation low/zero where possible.
Codecov Report
@@ Coverage Diff @@
## master #1664 +/- ##
========================================
+ Coverage 71.4% 71.9% +0.4%
========================================
Files 964 966 +2
Lines 80809 80849 +40
========================================
+ Hits 57741 58169 +428
+ Misses 19243 18876 -367
+ Partials 3825 3804 -21
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
The original implementation of using Read() instead of mmap required duplicating the F.Ds for each seeker clone (so they could seek concurrently) as well as issuing calls to Seek() (which was an additional syscall).
The new implementation takes advantage of the ReadAt() method / pread syscall to avoid the Seek() syscall. This has the added benefit of making the F.Ds stateless so that they can be shared among concurrent seekers which will cut the number of F.Ds required by M3DB in half for existing workloads and allow users to increase the seek concurrency configuration with no impact on the number of F.Ds.
Benefits:
There are no added tests because a lot of tests were already added in #1421 that verify the behavior of this portion of the code-base. In particular,
retriever_concurrent_test.go
contains a number of aggressive concurrency tests that verify the safety of sharing the F.Ds among multiple concurrent seekers.Thanks to Jacek Masiulaniec for the tip.