Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a custom parsing monad instead of attoparsec. #298

Merged
merged 8 commits into from
Jan 3, 2019
Merged

Conversation

judah
Copy link
Collaborator

@judah judah commented Dec 21, 2018

All decoding benchmarks show significant speedups after this change.
The biggest improvement is to decoding packed data which is 4-5x as fast
as before. (See below for a full list of benchmark diffs.)

This parsing monad follows the approach of, e.g., the store and persist
packages. It requires that all data be in a strict ByteString,
and uses simple pointer arithmetic internally to walk through its bytes.

This effectively works against #62 (streaming parsers) since it
needs to read all the input data before starting the parse. However,
that issue has already existed since the beginning of this library for,
e.g., submessages; see that bug for more details. So this change
doesn't appear to be a regression. We also have freedom to later
try out different implementations without changing the API, since
Parser is opaque as of #294.

The implementation of Parser differs from store and persist by using
ExceptT to pass around errors internally, rather than exceptions (or
closures, as in attoparsec). We may want to experiment with this later,
but in my initial experiments I didn't see a significant improvement
from those approaches.

Benchmark results (the "time" output from Criterion):

flat(602B)/decode/whnf:
13.14 μs (13.02 μs .. 13.29 μs)
=> 8.686 μs (8.514 μs .. 8.873 μs)

nested(900B)/decode/whnf:
26.35 μs (25.85 μs .. 26.86 μs)
=> 11.66 μs (11.36 μs .. 11.99 μs)

int32-packed(1003B)/decode/whnf:
36.23 μs (35.75 μs .. 36.69 μs)
=> 17.31 μs (17.11 μs .. 17.50 μs)

int32-unpacked(2000B)/decode/whnf:
65.18 μs (64.19 μs .. 66.68 μs)
=> 19.35 μs (19.13 μs .. 19.58 μs)

float-packed(4003B)/decode/whnf:
78.61 μs (77.53 μs .. 79.46 μs)
=> 19.56 μs (19.40 μs .. 19.76 μs)

float-unpacked(5000B)/decode/whnf:
108.9 μs (107.8 μs .. 110.3 μs)
=> 22.29 μs (22.00 μs .. 22.66 μs)

no-unused(10003B)/decode/whnf:
571.7 μs (560.0 μs .. 586.6 μs)
=> 356.5 μs (349.0 μs .. 365.0 μs)

with-unused(10003B)/decode/whnf:
786.6 μs (697.8 μs .. 875.5 μs)
=> 368.3 μs (361.8 μs .. 376.4 μs)


This change is Reviewable

All decoding benchmarks show significant speedups after this change.
The biggest improvement is to decoding packed data which is 4-5x as fast
as before.  (See below for a full list of benchmark diffs.)

This parsing monad follows the approach of, e.g., the `store` and `persist`
packages.  It requires that all data be in a *strict* `ByteString`,
and uses simple pointer arithmetic internally to walk through its bytes.

This effectively works against #62 (streaming parsers) since it
needs to read all the input data before starting the parse.  However,
that issue has already existed since the beginning of this library for,
e.g., submessages; see that bug for more details.  So this change
doesn't appear to be a regression.  We also have freedom to later
try out different implementations without changing the API, since
`Parser` is opaque as of #294.

The implementation of Parser differs from `store` and `persist` by using
`ExceptT` to pass around errors internally, rather than exceptions (or
closures, as in `attoparsec`).  We may want to experiment with this later,
but in my initial experiments I didn't see a significant improvement
from those approaches.

Benchmark results (the "time" output from Criterion):

flat(602B)/decode/whnf:
   13.14 μs   (13.02 μs .. 13.29 μs)
=> 8.686 μs   (8.514 μs .. 8.873 μs)

nested(900B)/decode/whnf:
    26.35 μs   (25.85 μs .. 26.86 μs)
=>  14.01 μs   (13.86 μs .. 14.18 μs)

int32-packed(1003B)/decode/whnf:
   36.23 μs   (35.75 μs .. 36.69 μs)
=> 17.31 μs   (17.11 μs .. 17.50 μs)

int32-unpacked(2000B)/decode/whnf:
   65.18 μs   (64.19 μs .. 66.68 μs)
=> 19.35 μs   (19.13 μs .. 19.58 μs)

float-packed(4003B)/decode/whnf:
   78.61 μs   (77.53 μs .. 79.46 μs)
=> 19.56 μs   (19.40 μs .. 19.76 μs)

float-unpacked(5000B)/decode/whnf:
   108.9 μs   (107.8 μs .. 110.3 μs)
=> 22.29 μs   (22.00 μs .. 22.66 μs)

no-unused(10003B)/decode/whnf:
   571.7 μs   (560.0 μs .. 586.6 μs)
=> 356.5 μs   (349.0 μs .. 365.0 μs)

with-unused(10003B)/decode/whnf:
   786.6 μs   (697.8 μs .. 875.5 μs)
=> 368.3 μs   (361.8 μs .. 376.4 μs)
Copy link
Collaborator

@blackgnezdo blackgnezdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

I'd love for @augustss to also chime in on this one.

This improves the nested benchmark a bit:

benchmarking nested(900B)/decode/whnf
   14.32 μs   (14.08 μs .. 14.57 μs)
=> 11.66 μs   (11.36 μs .. 11.99 μs)

It didn't make a significant difference in the packed benchmark,
I think because the effects of using lists currently dominate everything else.
@judah
Copy link
Collaborator Author

judah commented Dec 22, 2018

Thank you @blackgnezdo for the detailed review!

-- @len@ bytes remaining. That is, once @len@ bytes have been
-- consumed, 'atEnd' will return 'True' and other actions
-- like 'getWord8' will act like there is no input remaining.
isolate :: Int -> Parser a -> Parser a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we catch negative len?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; fixed to fail the parse in that case.

-- It is only safe for @f@ to peek between its argument @p@ and
-- @p `plusPtr` (len - 1)@, inclusive.
withSized :: Int -> String -> (Ptr Word8 -> IO a) -> Parser a
withSized len message f = Parser $ \end pos ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we catch negative len?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only exposed function that passes a user-defined value is getBytes. It calls packCStringLen which does check for negative length; however, that function throws an exception in that case.

Fixed by adding a manual check inside getBytes.

Copy link
Collaborator

@blackgnezdo blackgnezdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the newly added test coverage!

getBytes :: Int -> Parser B.ByteString
getBytes n = withSized n "getBytes: Unexpected end of input" $ \pos ->
getBytes n
| n < 0 = fail "getBytes: negative length"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is parsing 0 bytes a normal (useful?) thing to support? If so I'd document it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for example if you have a proto string field, then the empty string value may be encoded as the varint "0".

Updated the comment.

getBytes n = withSized n "getBytes: Unexpected end of input" $ \pos ->
getBytes n
| n < 0 = fail "getBytes: negative length"
| otherwise = withSized n "getBytes: Unexpected end of input" $ \pos ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably prefer to rephrase it with the correct case guarded with the assertion and fail otherwise. It will then become order independent and the expected behavior will go first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

B.packCStringLen (castPtr pos, n)

-- | Helper function for reading bytes from the current position and
-- advancing the pointer.
--
-- It is only safe for @f@ to peek between its argument @p@ and
-- @p `plusPtr` (len - 1)@, inclusive.
--
-- This function is not safe to use with a negative length.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you quantify the impact of including the assertion branch into this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the assertion into withSized. Happily, it turns out that GHC with -O will completely elide the assertion if the length is constant.

@judah judah merged commit b250a54 into master Jan 3, 2019
@judah judah deleted the encoding-parser branch January 3, 2019 00:41
avdv pushed a commit to avdv/proto-lens that referenced this pull request Aug 9, 2023
All decoding benchmarks show significant speedups after this change.
The biggest improvement is to decoding packed data which is 4-5x as fast
as before.  (See below for a full list of benchmark diffs.)

This parsing monad follows the approach of, e.g., the `store` and `persist`
packages.  It requires that all data be in a *strict* `ByteString`,
and uses simple pointer arithmetic internally to walk through its bytes.

This effectively works against google#62 (streaming parsers) since it
needs to read all the input data before starting the parse.  However,
that issue has already existed since the beginning of this library for,
e.g., submessages; see that bug for more details.  So this change
doesn't appear to be a regression.  We also have freedom to later
try out different implementations without changing the API, since
`Parser` is opaque as of google#294.

The implementation of Parser differs from `store` and `persist` by using
`ExceptT` to pass around errors internally, rather than exceptions (or
closures, as in `attoparsec`).  We may want to experiment with this later,
but in my initial experiments I didn't see a significant improvement
from those approaches.

Benchmark results (the "time" output from Criterion):

flat(602B)/decode/whnf:
   13.14 μs   (13.02 μs .. 13.29 μs)
=> 8.686 μs   (8.514 μs .. 8.873 μs)

nested(900B)/decode/whnf:
    26.35 μs   (25.85 μs .. 26.86 μs)
=> 11.66 μs   (11.36 μs .. 11.99 μs)

int32-packed(1003B)/decode/whnf:
   36.23 μs   (35.75 μs .. 36.69 μs)
=> 17.31 μs   (17.11 μs .. 17.50 μs)

int32-unpacked(2000B)/decode/whnf:
   65.18 μs   (64.19 μs .. 66.68 μs)
=> 19.35 μs   (19.13 μs .. 19.58 μs)

float-packed(4003B)/decode/whnf:
   78.61 μs   (77.53 μs .. 79.46 μs)
=> 19.56 μs   (19.40 μs .. 19.76 μs)

float-unpacked(5000B)/decode/whnf:
   108.9 μs   (107.8 μs .. 110.3 μs)
=> 22.29 μs   (22.00 μs .. 22.66 μs)

no-unused(10003B)/decode/whnf:
   571.7 μs   (560.0 μs .. 586.6 μs)
=> 356.5 μs   (349.0 μs .. 365.0 μs)

with-unused(10003B)/decode/whnf:
   786.6 μs   (697.8 μs .. 875.5 μs)
=> 368.3 μs   (361.8 μs .. 376.4 μs)

Also added isolate and used it for parsing messages and packed fields.

This improved the nested benchmark a bit compared to without it:

benchmarking nested(900B)/decode/whnf
   14.32 μs   (14.08 μs .. 14.57 μs)
=> 11.66 μs   (11.36 μs .. 11.99 μs)

It didn't make a significant difference in the packed benchmark,
I think because the effects of using lists currently dominate everything else.
ylecornec pushed a commit to ylecornec/proto-lens that referenced this pull request Feb 19, 2024
All decoding benchmarks show significant speedups after this change.
The biggest improvement is to decoding packed data which is 4-5x as fast
as before.  (See below for a full list of benchmark diffs.)

This parsing monad follows the approach of, e.g., the `store` and `persist`
packages.  It requires that all data be in a *strict* `ByteString`,
and uses simple pointer arithmetic internally to walk through its bytes.

This effectively works against google#62 (streaming parsers) since it
needs to read all the input data before starting the parse.  However,
that issue has already existed since the beginning of this library for,
e.g., submessages; see that bug for more details.  So this change
doesn't appear to be a regression.  We also have freedom to later
try out different implementations without changing the API, since
`Parser` is opaque as of google#294.

The implementation of Parser differs from `store` and `persist` by using
`ExceptT` to pass around errors internally, rather than exceptions (or
closures, as in `attoparsec`).  We may want to experiment with this later,
but in my initial experiments I didn't see a significant improvement
from those approaches.

Benchmark results (the "time" output from Criterion):

flat(602B)/decode/whnf:
   13.14 μs   (13.02 μs .. 13.29 μs)
=> 8.686 μs   (8.514 μs .. 8.873 μs)

nested(900B)/decode/whnf:
    26.35 μs   (25.85 μs .. 26.86 μs)
=> 11.66 μs   (11.36 μs .. 11.99 μs)

int32-packed(1003B)/decode/whnf:
   36.23 μs   (35.75 μs .. 36.69 μs)
=> 17.31 μs   (17.11 μs .. 17.50 μs)

int32-unpacked(2000B)/decode/whnf:
   65.18 μs   (64.19 μs .. 66.68 μs)
=> 19.35 μs   (19.13 μs .. 19.58 μs)

float-packed(4003B)/decode/whnf:
   78.61 μs   (77.53 μs .. 79.46 μs)
=> 19.56 μs   (19.40 μs .. 19.76 μs)

float-unpacked(5000B)/decode/whnf:
   108.9 μs   (107.8 μs .. 110.3 μs)
=> 22.29 μs   (22.00 μs .. 22.66 μs)

no-unused(10003B)/decode/whnf:
   571.7 μs   (560.0 μs .. 586.6 μs)
=> 356.5 μs   (349.0 μs .. 365.0 μs)

with-unused(10003B)/decode/whnf:
   786.6 μs   (697.8 μs .. 875.5 μs)
=> 368.3 μs   (361.8 μs .. 376.4 μs)

Also added isolate and used it for parsing messages and packed fields.

This improved the nested benchmark a bit compared to without it:

benchmarking nested(900B)/decode/whnf
   14.32 μs   (14.08 μs .. 14.57 μs)
=> 11.66 μs   (11.36 μs .. 11.99 μs)

It didn't make a significant difference in the packed benchmark,
I think because the effects of using lists currently dominate everything else.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants