Skip to content

Commit

Permalink
Use a custom parsing monad instead of attoparsec. (google#298)
Browse files Browse the repository at this point in the history
All decoding benchmarks show significant speedups after this change.
The biggest improvement is to decoding packed data which is 4-5x as fast
as before.  (See below for a full list of benchmark diffs.)

This parsing monad follows the approach of, e.g., the `store` and `persist`
packages.  It requires that all data be in a *strict* `ByteString`,
and uses simple pointer arithmetic internally to walk through its bytes.

This effectively works against google#62 (streaming parsers) since it
needs to read all the input data before starting the parse.  However,
that issue has already existed since the beginning of this library for,
e.g., submessages; see that bug for more details.  So this change
doesn't appear to be a regression.  We also have freedom to later
try out different implementations without changing the API, since
`Parser` is opaque as of google#294.

The implementation of Parser differs from `store` and `persist` by using
`ExceptT` to pass around errors internally, rather than exceptions (or
closures, as in `attoparsec`).  We may want to experiment with this later,
but in my initial experiments I didn't see a significant improvement
from those approaches.

Benchmark results (the "time" output from Criterion):

flat(602B)/decode/whnf:
   13.14 μs   (13.02 μs .. 13.29 μs)
=> 8.686 μs   (8.514 μs .. 8.873 μs)

nested(900B)/decode/whnf:
    26.35 μs   (25.85 μs .. 26.86 μs)
=> 11.66 μs   (11.36 μs .. 11.99 μs)

int32-packed(1003B)/decode/whnf:
   36.23 μs   (35.75 μs .. 36.69 μs)
=> 17.31 μs   (17.11 μs .. 17.50 μs)

int32-unpacked(2000B)/decode/whnf:
   65.18 μs   (64.19 μs .. 66.68 μs)
=> 19.35 μs   (19.13 μs .. 19.58 μs)

float-packed(4003B)/decode/whnf:
   78.61 μs   (77.53 μs .. 79.46 μs)
=> 19.56 μs   (19.40 μs .. 19.76 μs)

float-unpacked(5000B)/decode/whnf:
   108.9 μs   (107.8 μs .. 110.3 μs)
=> 22.29 μs   (22.00 μs .. 22.66 μs)

no-unused(10003B)/decode/whnf:
   571.7 μs   (560.0 μs .. 586.6 μs)
=> 356.5 μs   (349.0 μs .. 365.0 μs)

with-unused(10003B)/decode/whnf:
   786.6 μs   (697.8 μs .. 875.5 μs)
=> 368.3 μs   (361.8 μs .. 376.4 μs)

Also added isolate and used it for parsing messages and packed fields.

This improved the nested benchmark a bit compared to without it:

benchmarking nested(900B)/decode/whnf
   14.32 μs   (14.08 μs .. 14.57 μs)
=> 11.66 μs   (11.36 μs .. 11.99 μs)

It didn't make a significant difference in the packed benchmark,
I think because the effects of using lists currently dominate everything else.
  • Loading branch information
judah authored Jan 3, 2019
1 parent 7bc47ea commit e7ad153
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 9 deletions.
7 changes: 1 addition & 6 deletions src/Data/ProtoLens/Compiler/Generate/Encoding.hs
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,6 @@ parseFieldCase loop x f = case plainFieldKind f of
_ -> [valueCase]
where
y = "y"
bytes = "bytes"
entry = "entry"
info = plainFieldInfo f
valueCase = pLitInt (fieldTag info) --> do'
Expand All @@ -258,11 +257,7 @@ parseFieldCase loop x f = case plainFieldKind f of
$ x
]
packedCase = pLitInt (packedFieldTag info) --> do'
[ bytes <-- parseFieldType lengthy
, y <-- "Data.ProtoLens.Encoding.Bytes.runEither"
@@ ("Data.ProtoLens.Encoding.Bytes.runParser"
@@ parsePackedField info
@@ bytes)
[ y <-- isolatedLengthy (parsePackedField info)
, stmt . loop . updateParseState (overField info ("Prelude.++" @@ y))
$ x
]
Expand Down
21 changes: 18 additions & 3 deletions src/Data/ProtoLens/Compiler/Generate/FieldEncoding.hs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ module Data.ProtoLens.Compiler.Generate.FieldEncoding
, fieldEncoding
, lengthy
, groupEnd
, isolatedLengthy
) where

import Data.Word (Word8)
Expand Down Expand Up @@ -181,10 +182,24 @@ stringField = partialField "Data.Text.Encoding.encodeUtf8" decodeUtf8P lengthy

-- | A protobuf message type.
message :: FieldEncoding
message = partialField
message = lengthy
{ buildFieldType = "Prelude.." @@
buildFieldType lengthy @@
"Data.ProtoLens.encodeMessage"
(\m -> "Data.ProtoLens.decodeMessage" @@ m)
lengthy
, parseFieldType = isolatedLengthy "Data.ProtoLens.parseMessage"
}

-- | Takes a @Parser a@, reads a varint and then runs the parser
-- isolated to the given length.
isolatedLengthy :: Exp -> Exp
isolatedLengthy parser = do'
[ len <-- getVarInt'
, stmt $ "Data.ProtoLens.Encoding.Bytes.isolate"
@@ (fromIntegral' @@ len)
@@ parser
]
where
len = "len"

-- | Some functions that are used in multiple places in the generated code.
getVarInt', putVarInt', fromIntegral' :: Exp
Expand Down

0 comments on commit e7ad153

Please sign in to comment.