-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a custom parsing monad instead of attoparsec. #298
Conversation
All decoding benchmarks show significant speedups after this change. The biggest improvement is to decoding packed data which is 4-5x as fast as before. (See below for a full list of benchmark diffs.) This parsing monad follows the approach of, e.g., the `store` and `persist` packages. It requires that all data be in a *strict* `ByteString`, and uses simple pointer arithmetic internally to walk through its bytes. This effectively works against #62 (streaming parsers) since it needs to read all the input data before starting the parse. However, that issue has already existed since the beginning of this library for, e.g., submessages; see that bug for more details. So this change doesn't appear to be a regression. We also have freedom to later try out different implementations without changing the API, since `Parser` is opaque as of #294. The implementation of Parser differs from `store` and `persist` by using `ExceptT` to pass around errors internally, rather than exceptions (or closures, as in `attoparsec`). We may want to experiment with this later, but in my initial experiments I didn't see a significant improvement from those approaches. Benchmark results (the "time" output from Criterion): flat(602B)/decode/whnf: 13.14 μs (13.02 μs .. 13.29 μs) => 8.686 μs (8.514 μs .. 8.873 μs) nested(900B)/decode/whnf: 26.35 μs (25.85 μs .. 26.86 μs) => 14.01 μs (13.86 μs .. 14.18 μs) int32-packed(1003B)/decode/whnf: 36.23 μs (35.75 μs .. 36.69 μs) => 17.31 μs (17.11 μs .. 17.50 μs) int32-unpacked(2000B)/decode/whnf: 65.18 μs (64.19 μs .. 66.68 μs) => 19.35 μs (19.13 μs .. 19.58 μs) float-packed(4003B)/decode/whnf: 78.61 μs (77.53 μs .. 79.46 μs) => 19.56 μs (19.40 μs .. 19.76 μs) float-unpacked(5000B)/decode/whnf: 108.9 μs (107.8 μs .. 110.3 μs) => 22.29 μs (22.00 μs .. 22.66 μs) no-unused(10003B)/decode/whnf: 571.7 μs (560.0 μs .. 586.6 μs) => 356.5 μs (349.0 μs .. 365.0 μs) with-unused(10003B)/decode/whnf: 786.6 μs (697.8 μs .. 875.5 μs) => 368.3 μs (361.8 μs .. 376.4 μs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
I'd love for @augustss to also chime in on this one.
This improves the nested benchmark a bit: benchmarking nested(900B)/decode/whnf 14.32 μs (14.08 μs .. 14.57 μs) => 11.66 μs (11.36 μs .. 11.99 μs) It didn't make a significant difference in the packed benchmark, I think because the effects of using lists currently dominate everything else.
28df564
to
e992e18
Compare
Thank you @blackgnezdo for the detailed review! |
-- @len@ bytes remaining. That is, once @len@ bytes have been | ||
-- consumed, 'atEnd' will return 'True' and other actions | ||
-- like 'getWord8' will act like there is no input remaining. | ||
isolate :: Int -> Parser a -> Parser a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we catch negative len
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch; fixed to fail the parse in that case.
-- It is only safe for @f@ to peek between its argument @p@ and | ||
-- @p `plusPtr` (len - 1)@, inclusive. | ||
withSized :: Int -> String -> (Ptr Word8 -> IO a) -> Parser a | ||
withSized len message f = Parser $ \end pos -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we catch negative len
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only exposed function that passes a user-defined value is getBytes
. It calls packCStringLen
which does check for negative length; however, that function throws an exception in that case.
Fixed by adding a manual check inside getBytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the newly added test coverage!
getBytes :: Int -> Parser B.ByteString | ||
getBytes n = withSized n "getBytes: Unexpected end of input" $ \pos -> | ||
getBytes n | ||
| n < 0 = fail "getBytes: negative length" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is parsing 0 bytes a normal (useful?) thing to support? If so I'd document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for example if you have a proto string field, then the empty string value may be encoded as the varint "0".
Updated the comment.
getBytes n = withSized n "getBytes: Unexpected end of input" $ \pos -> | ||
getBytes n | ||
| n < 0 = fail "getBytes: negative length" | ||
| otherwise = withSized n "getBytes: Unexpected end of input" $ \pos -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably prefer to rephrase it with the correct case guarded with the assertion and fail otherwise. It will then become order independent and the expected behavior will go first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
B.packCStringLen (castPtr pos, n) | ||
|
||
-- | Helper function for reading bytes from the current position and | ||
-- advancing the pointer. | ||
-- | ||
-- It is only safe for @f@ to peek between its argument @p@ and | ||
-- @p `plusPtr` (len - 1)@, inclusive. | ||
-- | ||
-- This function is not safe to use with a negative length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you quantify the impact of including the assertion branch into this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the assertion into withSized
. Happily, it turns out that GHC with -O
will completely elide the assertion if the length is constant.
All decoding benchmarks show significant speedups after this change. The biggest improvement is to decoding packed data which is 4-5x as fast as before. (See below for a full list of benchmark diffs.) This parsing monad follows the approach of, e.g., the `store` and `persist` packages. It requires that all data be in a *strict* `ByteString`, and uses simple pointer arithmetic internally to walk through its bytes. This effectively works against google#62 (streaming parsers) since it needs to read all the input data before starting the parse. However, that issue has already existed since the beginning of this library for, e.g., submessages; see that bug for more details. So this change doesn't appear to be a regression. We also have freedom to later try out different implementations without changing the API, since `Parser` is opaque as of google#294. The implementation of Parser differs from `store` and `persist` by using `ExceptT` to pass around errors internally, rather than exceptions (or closures, as in `attoparsec`). We may want to experiment with this later, but in my initial experiments I didn't see a significant improvement from those approaches. Benchmark results (the "time" output from Criterion): flat(602B)/decode/whnf: 13.14 μs (13.02 μs .. 13.29 μs) => 8.686 μs (8.514 μs .. 8.873 μs) nested(900B)/decode/whnf: 26.35 μs (25.85 μs .. 26.86 μs) => 11.66 μs (11.36 μs .. 11.99 μs) int32-packed(1003B)/decode/whnf: 36.23 μs (35.75 μs .. 36.69 μs) => 17.31 μs (17.11 μs .. 17.50 μs) int32-unpacked(2000B)/decode/whnf: 65.18 μs (64.19 μs .. 66.68 μs) => 19.35 μs (19.13 μs .. 19.58 μs) float-packed(4003B)/decode/whnf: 78.61 μs (77.53 μs .. 79.46 μs) => 19.56 μs (19.40 μs .. 19.76 μs) float-unpacked(5000B)/decode/whnf: 108.9 μs (107.8 μs .. 110.3 μs) => 22.29 μs (22.00 μs .. 22.66 μs) no-unused(10003B)/decode/whnf: 571.7 μs (560.0 μs .. 586.6 μs) => 356.5 μs (349.0 μs .. 365.0 μs) with-unused(10003B)/decode/whnf: 786.6 μs (697.8 μs .. 875.5 μs) => 368.3 μs (361.8 μs .. 376.4 μs) Also added isolate and used it for parsing messages and packed fields. This improved the nested benchmark a bit compared to without it: benchmarking nested(900B)/decode/whnf 14.32 μs (14.08 μs .. 14.57 μs) => 11.66 μs (11.36 μs .. 11.99 μs) It didn't make a significant difference in the packed benchmark, I think because the effects of using lists currently dominate everything else.
All decoding benchmarks show significant speedups after this change. The biggest improvement is to decoding packed data which is 4-5x as fast as before. (See below for a full list of benchmark diffs.) This parsing monad follows the approach of, e.g., the `store` and `persist` packages. It requires that all data be in a *strict* `ByteString`, and uses simple pointer arithmetic internally to walk through its bytes. This effectively works against google#62 (streaming parsers) since it needs to read all the input data before starting the parse. However, that issue has already existed since the beginning of this library for, e.g., submessages; see that bug for more details. So this change doesn't appear to be a regression. We also have freedom to later try out different implementations without changing the API, since `Parser` is opaque as of google#294. The implementation of Parser differs from `store` and `persist` by using `ExceptT` to pass around errors internally, rather than exceptions (or closures, as in `attoparsec`). We may want to experiment with this later, but in my initial experiments I didn't see a significant improvement from those approaches. Benchmark results (the "time" output from Criterion): flat(602B)/decode/whnf: 13.14 μs (13.02 μs .. 13.29 μs) => 8.686 μs (8.514 μs .. 8.873 μs) nested(900B)/decode/whnf: 26.35 μs (25.85 μs .. 26.86 μs) => 11.66 μs (11.36 μs .. 11.99 μs) int32-packed(1003B)/decode/whnf: 36.23 μs (35.75 μs .. 36.69 μs) => 17.31 μs (17.11 μs .. 17.50 μs) int32-unpacked(2000B)/decode/whnf: 65.18 μs (64.19 μs .. 66.68 μs) => 19.35 μs (19.13 μs .. 19.58 μs) float-packed(4003B)/decode/whnf: 78.61 μs (77.53 μs .. 79.46 μs) => 19.56 μs (19.40 μs .. 19.76 μs) float-unpacked(5000B)/decode/whnf: 108.9 μs (107.8 μs .. 110.3 μs) => 22.29 μs (22.00 μs .. 22.66 μs) no-unused(10003B)/decode/whnf: 571.7 μs (560.0 μs .. 586.6 μs) => 356.5 μs (349.0 μs .. 365.0 μs) with-unused(10003B)/decode/whnf: 786.6 μs (697.8 μs .. 875.5 μs) => 368.3 μs (361.8 μs .. 376.4 μs) Also added isolate and used it for parsing messages and packed fields. This improved the nested benchmark a bit compared to without it: benchmarking nested(900B)/decode/whnf 14.32 μs (14.08 μs .. 14.57 μs) => 11.66 μs (11.36 μs .. 11.99 μs) It didn't make a significant difference in the packed benchmark, I think because the effects of using lists currently dominate everything else.
All decoding benchmarks show significant speedups after this change.
The biggest improvement is to decoding packed data which is 4-5x as fast
as before. (See below for a full list of benchmark diffs.)
This parsing monad follows the approach of, e.g., the
store
andpersist
packages. It requires that all data be in a strict
ByteString
,and uses simple pointer arithmetic internally to walk through its bytes.
This effectively works against #62 (streaming parsers) since it
needs to read all the input data before starting the parse. However,
that issue has already existed since the beginning of this library for,
e.g., submessages; see that bug for more details. So this change
doesn't appear to be a regression. We also have freedom to later
try out different implementations without changing the API, since
Parser
is opaque as of #294.The implementation of Parser differs from
store
andpersist
by usingExceptT
to pass around errors internally, rather than exceptions (orclosures, as in
attoparsec
). We may want to experiment with this later,but in my initial experiments I didn't see a significant improvement
from those approaches.
Benchmark results (the "time" output from Criterion):
flat(602B)/decode/whnf:
13.14 μs (13.02 μs .. 13.29 μs)
=> 8.686 μs (8.514 μs .. 8.873 μs)
nested(900B)/decode/whnf:
26.35 μs (25.85 μs .. 26.86 μs)
=> 11.66 μs (11.36 μs .. 11.99 μs)
int32-packed(1003B)/decode/whnf:
36.23 μs (35.75 μs .. 36.69 μs)
=> 17.31 μs (17.11 μs .. 17.50 μs)
int32-unpacked(2000B)/decode/whnf:
65.18 μs (64.19 μs .. 66.68 μs)
=> 19.35 μs (19.13 μs .. 19.58 μs)
float-packed(4003B)/decode/whnf:
78.61 μs (77.53 μs .. 79.46 μs)
=> 19.56 μs (19.40 μs .. 19.76 μs)
float-unpacked(5000B)/decode/whnf:
108.9 μs (107.8 μs .. 110.3 μs)
=> 22.29 μs (22.00 μs .. 22.66 μs)
no-unused(10003B)/decode/whnf:
571.7 μs (560.0 μs .. 586.6 μs)
=> 356.5 μs (349.0 μs .. 365.0 μs)
with-unused(10003B)/decode/whnf:
786.6 μs (697.8 μs .. 875.5 μs)
=> 368.3 μs (361.8 μs .. 376.4 μs)
This change is