change fast-parse to cat-parse #834

timzaak · 2021-04-06T13:50:33Z

ref #650 lots of code are from lvitaly.
I have run benchmarks on my desktop computer.

OS: openSUSE Leap 15.2 x86_64
Kernel: 5.3.18-lp152.66-preempt
CPU: Intel i7-6700 (8) @ 4.000GHz
Memory: 27920MiB / 32023MiB

the cat-parse result:

[info] Benchmark                                                        Mode  Cnt     Score     Error  Units
[info] GraphQLBenchmarks.introspectCaliban                             thrpt    5   174.243 ±  60.431  ops/s
[info] GraphQLBenchmarks.introspectSangria                             thrpt    5   408.381 ± 159.162  ops/s
[info] GraphQLBenchmarks.simpleCaliban                                 thrpt    5   655.142 ±  61.412  ops/s
[info] GraphQLBenchmarks.simpleSangria                                 thrpt    5  8336.505 ± 122.964  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery100             thrpt    5   254.165 ±   8.627  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery1000            thrpt    5    62.174 ±   0.519  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery10000           thrpt    5     6.959 ±   0.104  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery100            thrpt    5   164.686 ±   7.417  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery1000           thrpt    5    24.425 ±   1.389  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery10000          thrpt    5     2.755 ±   0.111  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery100          thrpt    5   255.859 ±   4.674  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery1000         thrpt    5    61.544 ±   1.779  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery10000        thrpt    5     7.213 ±   0.142  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery100       thrpt    5   437.501 ±  23.922  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery1000      thrpt    5    90.925 ±   2.230  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery10000     thrpt    5    10.271 ±   0.217  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery100      thrpt    5   189.302 ±   6.047  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery1000     thrpt    5    24.527 ±   0.309  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery10000    thrpt    5     2.432 ±   0.146  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery100    thrpt    5   439.015 ±  20.834  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery1000   thrpt    5    90.751 ±   3.903  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery10000  thrpt    5    10.216 ±   0.265  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery100           thrpt    5   578.245 ±  45.990  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery1000          thrpt    5   224.753 ±   3.169  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery10000         thrpt    5    28.038 ±   1.275  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery100          thrpt    5   362.207 ±  17.743  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery1000         thrpt    5    64.523 ±   0.731  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery10000        thrpt    5     6.465 ±   0.065  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery100        thrpt    5   579.062 ±  24.800  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery1000       thrpt    5   222.029 ±   3.291  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery10000      thrpt    5    27.687 ±   1.718  ops/s

the fast-parse result:

[info] Benchmark                                                        Mode  Cnt      Score     Error  Units
[info] GraphQLBenchmarks.introspectCaliban                             thrpt    5   1700.357 ±  84.346  ops/s
[info] GraphQLBenchmarks.introspectSangria                             thrpt    5    424.921 ±  77.148  ops/s
[info] GraphQLBenchmarks.simpleCaliban                                 thrpt    5  34580.933 ± 816.342  ops/s
[info] GraphQLBenchmarks.simpleSangria                                 thrpt    5   7894.296 ± 780.517  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery100             thrpt    5    794.987 ±  13.794  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery1000            thrpt    5     78.817 ±   1.557  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery10000           thrpt    5      7.680 ±   0.172  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery100            thrpt    5    278.831 ±   9.919  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery1000           thrpt    5     27.474 ±   1.136  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery10000          thrpt    5      2.849 ±   0.103  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery100          thrpt    5    765.211 ±  17.195  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery1000         thrpt    5     76.385 ±   1.431  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery10000        thrpt    5      7.587 ±   0.167  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery100       thrpt    5   1335.653 ±  38.236  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery1000      thrpt    5    108.383 ±  19.814  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery10000     thrpt    5     10.193 ±   1.240  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery100      thrpt    5    271.745 ±   6.259  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery1000     thrpt    5     25.601 ±   1.110  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery10000    thrpt    5      2.390 ±   0.087  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery100    thrpt    5   1320.421 ±  54.977  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery1000   thrpt    5    111.177 ±   5.237  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery10000  thrpt    5     10.735 ±   0.556  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery100           thrpt    5   3661.123 ± 192.645  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery1000          thrpt    5    363.128 ±  32.954  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery10000         thrpt    5     29.850 ±   1.307  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery100          thrpt    5    863.198 ±  67.605  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery1000         thrpt    5     72.265 ±   2.801  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery10000        thrpt    5      6.933 ±   0.190  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery100        thrpt    5   3705.457 ±  46.360  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery1000       thrpt    5    349.199 ±  32.415  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery10000      thrpt    5     29.640 ±   1.500  ops/s

don't know if it's fast enough to replace fast-parse

ghostdogpr · 2021-04-06T14:06:43Z

Ouch, it's very slow 😢
The ones to look at are introspectCaliban and simpleCaliban:

simpleCaliban is a simple query: performance is 50x slower
introspectCaliban is a complex query: performance is 10x slower

In both cases it would make caliban much slower than sangria.

Now, do you think the parsing code could be optimized, or is the slowness entirely caused by cats-parse? If I get some time this weekend I could do some profiling to see what makes it so slow.

PS: run fmt in sbt to format the code (that's the reason the CI fails).

timzaak · 2021-04-06T15:29:41Z

I just make def to val, and remove P.defer0, It improve a lot.

val document: Parser0[ParsedDocument] =
    (P.start *> whitespaceWithComment *> definition.repSep0(whitespaceWithComment) <* whitespaceWithComment <* P.end).map(seq => ParsedDocument(seq))

[info] # Warmup Iteration   1: 404.573 ops/s
[info] # Warmup Iteration   2: 809.628 ops/s
[info] # Warmup Iteration   3: 839.770 ops/s
[info] # Warmup Iteration   4: 871.053 ops/s
[info] # Warmup Iteration   5: 899.415 ops/s
[info] Iteration   1: 903.723 ops/s
[info] Iteration   2: 914.723 ops/s
[info] Iteration   3: 918.853 ops/s
[info] Iteration   4: 921.035 ops/s
[info] Iteration   5: 919.765 ops/s
[info] Result "caliban.GraphQLBenchmarks.introspectCaliban":
[info]   915.620 ±(99.9%) 27.181 ops/s [Average]
[info]   (min, avg, max) = (903.723, 915.620, 921.035), stdev = 7.059

I will change it tomorrow.

timzaak · 2021-04-07T16:09:55Z

[info] Benchmark                                                        Mode  Cnt     Score    Error  Units
[info] GraphQLBenchmarks.introspectCaliban                             thrpt    5   276.946 ±  4.698  ops/s
[info] GraphQLBenchmarks.introspectSangria                             thrpt    5   444.055 ± 33.928  ops/s
[info] GraphQLBenchmarks.simpleCaliban                                 thrpt    5   351.205 ± 11.015  ops/s
[info] GraphQLBenchmarks.simpleSangria                                 thrpt    5  8509.736 ± 51.700  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery100             thrpt    5   245.568 ±  1.445  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery1000            thrpt    5    63.020 ±  1.697  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery10000           thrpt    5     7.492 ±  0.063  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery100            thrpt    5   161.071 ±  4.537  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery1000           thrpt    5    25.943 ±  0.110  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery10000          thrpt    5     2.680 ±  0.065  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery100          thrpt    5   248.203 ±  3.531  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery1000         thrpt    5    64.193 ±  0.905  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery10000        thrpt    5     7.358 ±  0.039  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery100       thrpt    5   285.356 ±  5.961  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery1000      thrpt    5    86.301 ±  2.463  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery10000     thrpt    5    10.682 ±  0.597  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery100      thrpt    5   151.799 ±  3.160  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery1000     thrpt    5    24.025 ±  0.237  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery10000    thrpt    5     2.435 ±  0.099  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery100    thrpt    5   282.982 ±  9.090  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery1000   thrpt    5    86.496 ±  1.913  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery10000  thrpt    5    10.685 ±  0.328  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery100           thrpt    5   328.706 ± 14.948  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery1000          thrpt    5   181.841 ±  4.186  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery10000         thrpt    5    26.440 ±  0.953  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery100          thrpt    5   243.525 ±  8.455  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery1000         thrpt    5    61.012 ±  2.067  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery10000        thrpt    5     6.909 ±  0.080  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery100        thrpt    5   330.774 ± 16.041  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery1000       thrpt    5   179.764 ±  2.028  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery10000      thrpt    5    27.805 ±  0.956  ops/s

The performance does not change so much.

Last night benchmark result can not reach today with same code, don't know the reason.

timzaak · 2021-04-09T02:23:24Z

@ghostdogpr It may be the final version. I don't know if there have other ways to improve performance

ghostdogpr · 2021-04-09T03:13:37Z

I'll dig into cats-parse and see if the profiler shows something obvious to improve, maybe this weekend or the next.

ghostdogpr · 2021-04-10T06:22:35Z

@timzaak I did some profiling of your code. I found 2 things:

Changing all the def into val and lazy val improves the performance by x3. To make it even better it would be better to remove the description: Option[String] (in *TypeDefinition parsers) and parse the description inside those functions so that they could be val as well.
Apart from that, most time seems to be spent generating hashcodes. I opened CPU time spent in hashcode computation typelevel/cats-parse#198 to see if it's something cats-parse can avoid or if comes from our usage of it. It looks like if we could solve that one, performance should be pretty good!

Btw I will review the correctness more in details once the performance issue is solved.

timzaak · 2021-04-10T13:15:59Z

@ghostdogpr thanks for your profile. I will watch typelevel/cats-parse#198

timzaak · 2021-04-14T01:31:59Z

wait until cats-parse 0.3.3 release

ghostdogpr · 2021-04-18T13:39:19Z

@timzaak can you rebase your PR into this branch? #847

This is the branch I created for Scala 3 support. The code that is specific to Scala 2 is in the scala-2 folder while the code specific to Scala 3 is in the scala-3 folder. I already moved Fastparse to the scala-2 side, so you can add your part in the scala-3 one, and they will both coexist. The downside of that is that we won't be able to run tests yet because the derivation is not working, but since tests are green now it should be fine.

timzaak · 2021-04-19T10:59:32Z

@ghostdogpr I will do it later

ghostdogpr and others added 8 commits April 14, 2021 22:11

Initial setup

ec9c58b

fmt

f685e71

Fix CI

a141979

Move stuff

81d8905

Simplify

4963988

Fix CI

0c160f4

Upgrade Scala 3 to RC2

159cced

gqldoc macro for Scala 3 (#849)

382fbc7

bak cat-parse

ae29d0d

timzaak closed this Apr 19, 2021

timzaak mentioned this pull request Apr 19, 2021

cats-parse scala3 #850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change fast-parse to cat-parse #834

change fast-parse to cat-parse #834

timzaak commented Apr 6, 2021 •

edited

Loading

ghostdogpr commented Apr 6, 2021 •

edited

Loading

timzaak commented Apr 6, 2021 •

edited

Loading

timzaak commented Apr 7, 2021

timzaak commented Apr 9, 2021

ghostdogpr commented Apr 9, 2021

ghostdogpr commented Apr 10, 2021

timzaak commented Apr 10, 2021

timzaak commented Apr 14, 2021

ghostdogpr commented Apr 18, 2021

timzaak commented Apr 19, 2021

change fast-parse to cat-parse #834

change fast-parse to cat-parse #834

Conversation

timzaak commented Apr 6, 2021 • edited Loading

ghostdogpr commented Apr 6, 2021 • edited Loading

timzaak commented Apr 6, 2021 • edited Loading

timzaak commented Apr 7, 2021

timzaak commented Apr 9, 2021

ghostdogpr commented Apr 9, 2021

ghostdogpr commented Apr 10, 2021

timzaak commented Apr 10, 2021

timzaak commented Apr 14, 2021

ghostdogpr commented Apr 18, 2021

timzaak commented Apr 19, 2021

timzaak commented Apr 6, 2021 •

edited

Loading

ghostdogpr commented Apr 6, 2021 •

edited

Loading

timzaak commented Apr 6, 2021 •

edited

Loading