Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change fast-parse to cat-parse #834

Closed
wants to merge 9 commits into from
Closed

change fast-parse to cat-parse #834

wants to merge 9 commits into from

Conversation

timzaak
Copy link
Contributor

@timzaak timzaak commented Apr 6, 2021

ref #650 lots of code are from lvitaly.
I have run benchmarks on my desktop computer.

OS: openSUSE Leap 15.2 x86_64
Kernel: 5.3.18-lp152.66-preempt
CPU: Intel i7-6700 (8) @ 4.000GHz
Memory: 27920MiB / 32023MiB

the cat-parse result:

[info] Benchmark                                                        Mode  Cnt     Score     Error  Units
[info] GraphQLBenchmarks.introspectCaliban                             thrpt    5   174.243 ±  60.431  ops/s
[info] GraphQLBenchmarks.introspectSangria                             thrpt    5   408.381 ± 159.162  ops/s
[info] GraphQLBenchmarks.simpleCaliban                                 thrpt    5   655.142 ±  61.412  ops/s
[info] GraphQLBenchmarks.simpleSangria                                 thrpt    5  8336.505 ± 122.964  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery100             thrpt    5   254.165 ±   8.627  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery1000            thrpt    5    62.174 ±   0.519  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery10000           thrpt    5     6.959 ±   0.104  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery100            thrpt    5   164.686 ±   7.417  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery1000           thrpt    5    24.425 ±   1.389  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery10000          thrpt    5     2.755 ±   0.111  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery100          thrpt    5   255.859 ±   4.674  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery1000         thrpt    5    61.544 ±   1.779  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery10000        thrpt    5     7.213 ±   0.142  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery100       thrpt    5   437.501 ±  23.922  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery1000      thrpt    5    90.925 ±   2.230  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery10000     thrpt    5    10.271 ±   0.217  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery100      thrpt    5   189.302 ±   6.047  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery1000     thrpt    5    24.527 ±   0.309  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery10000    thrpt    5     2.432 ±   0.146  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery100    thrpt    5   439.015 ±  20.834  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery1000   thrpt    5    90.751 ±   3.903  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery10000  thrpt    5    10.216 ±   0.265  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery100           thrpt    5   578.245 ±  45.990  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery1000          thrpt    5   224.753 ±   3.169  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery10000         thrpt    5    28.038 ±   1.275  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery100          thrpt    5   362.207 ±  17.743  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery1000         thrpt    5    64.523 ±   0.731  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery10000        thrpt    5     6.465 ±   0.065  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery100        thrpt    5   579.062 ±  24.800  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery1000       thrpt    5   222.029 ±   3.291  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery10000      thrpt    5    27.687 ±   1.718  ops/s

the fast-parse result:

[info] Benchmark                                                        Mode  Cnt      Score     Error  Units
[info] GraphQLBenchmarks.introspectCaliban                             thrpt    5   1700.357 ±  84.346  ops/s
[info] GraphQLBenchmarks.introspectSangria                             thrpt    5    424.921 ±  77.148  ops/s
[info] GraphQLBenchmarks.simpleCaliban                                 thrpt    5  34580.933 ± 816.342  ops/s
[info] GraphQLBenchmarks.simpleSangria                                 thrpt    5   7894.296 ± 780.517  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery100             thrpt    5    794.987 ±  13.794  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery1000            thrpt    5     78.817 ±   1.557  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery10000           thrpt    5      7.680 ±   0.172  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery100            thrpt    5    278.831 ±   9.919  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery1000           thrpt    5     27.474 ±   1.136  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery10000          thrpt    5      2.849 ±   0.103  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery100          thrpt    5    765.211 ±  17.195  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery1000         thrpt    5     76.385 ±   1.431  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery10000        thrpt    5      7.587 ±   0.167  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery100       thrpt    5   1335.653 ±  38.236  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery1000      thrpt    5    108.383 ±  19.814  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery10000     thrpt    5     10.193 ±   1.240  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery100      thrpt    5    271.745 ±   6.259  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery1000     thrpt    5     25.601 ±   1.110  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery10000    thrpt    5      2.390 ±   0.087  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery100    thrpt    5   1320.421 ±  54.977  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery1000   thrpt    5    111.177 ±   5.237  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery10000  thrpt    5     10.735 ±   0.556  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery100           thrpt    5   3661.123 ± 192.645  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery1000          thrpt    5    363.128 ±  32.954  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery10000         thrpt    5     29.850 ±   1.307  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery100          thrpt    5    863.198 ±  67.605  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery1000         thrpt    5     72.265 ±   2.801  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery10000        thrpt    5      6.933 ±   0.190  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery100        thrpt    5   3705.457 ±  46.360  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery1000       thrpt    5    349.199 ±  32.415  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery10000      thrpt    5     29.640 ±   1.500  ops/s

image
don't know if it's fast enough to replace fast-parse

@ghostdogpr
Copy link
Owner

ghostdogpr commented Apr 6, 2021

Ouch, it's very slow 😢
The ones to look at are introspectCaliban and simpleCaliban:

  • simpleCaliban is a simple query: performance is 50x slower
  • introspectCaliban is a complex query: performance is 10x slower

In both cases it would make caliban much slower than sangria.

Now, do you think the parsing code could be optimized, or is the slowness entirely caused by cats-parse? If I get some time this weekend I could do some profiling to see what makes it so slow.

PS: run fmt in sbt to format the code (that's the reason the CI fails).

@timzaak
Copy link
Contributor Author

timzaak commented Apr 6, 2021

I just make def to val, and remove P.defer0, It improve a lot.

val document: Parser0[ParsedDocument] =
    (P.start *> whitespaceWithComment *> definition.repSep0(whitespaceWithComment) <* whitespaceWithComment <* P.end).map(seq => ParsedDocument(seq))
[info] # Warmup Iteration   1: 404.573 ops/s
[info] # Warmup Iteration   2: 809.628 ops/s
[info] # Warmup Iteration   3: 839.770 ops/s
[info] # Warmup Iteration   4: 871.053 ops/s
[info] # Warmup Iteration   5: 899.415 ops/s
[info] Iteration   1: 903.723 ops/s
[info] Iteration   2: 914.723 ops/s
[info] Iteration   3: 918.853 ops/s
[info] Iteration   4: 921.035 ops/s
[info] Iteration   5: 919.765 ops/s
[info] Result "caliban.GraphQLBenchmarks.introspectCaliban":
[info]   915.620 ±(99.9%) 27.181 ops/s [Average]
[info]   (min, avg, max) = (903.723, 915.620, 921.035), stdev = 7.059

I will change it tomorrow.

@timzaak
Copy link
Contributor Author

timzaak commented Apr 7, 2021

[info] Benchmark                                                        Mode  Cnt     Score    Error  Units
[info] GraphQLBenchmarks.introspectCaliban                             thrpt    5   276.946 ±  4.698  ops/s
[info] GraphQLBenchmarks.introspectSangria                             thrpt    5   444.055 ± 33.928  ops/s
[info] GraphQLBenchmarks.simpleCaliban                                 thrpt    5   351.205 ± 11.015  ops/s
[info] GraphQLBenchmarks.simpleSangria                                 thrpt    5  8509.736 ± 51.700  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery100             thrpt    5   245.568 ±  1.445  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery1000            thrpt    5    63.020 ±  1.697  ops/s
[info] execution.NestedZQueryBenchmark.deepBatchedQuery10000           thrpt    5     7.492 ±  0.063  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery100            thrpt    5   161.071 ±  4.537  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery1000           thrpt    5    25.943 ±  0.110  ops/s
[info] execution.NestedZQueryBenchmark.deepParallelQuery10000          thrpt    5     2.680 ±  0.065  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery100          thrpt    5   248.203 ±  3.531  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery1000         thrpt    5    64.193 ±  0.905  ops/s
[info] execution.NestedZQueryBenchmark.deepSequentialQuery10000        thrpt    5     7.358 ±  0.039  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery100       thrpt    5   285.356 ±  5.961  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery1000      thrpt    5    86.301 ±  2.463  ops/s
[info] execution.NestedZQueryBenchmark.multifieldBatchedQuery10000     thrpt    5    10.682 ±  0.597  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery100      thrpt    5   151.799 ±  3.160  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery1000     thrpt    5    24.025 ±  0.237  ops/s
[info] execution.NestedZQueryBenchmark.multifieldParallelQuery10000    thrpt    5     2.435 ±  0.099  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery100    thrpt    5   282.982 ±  9.090  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery1000   thrpt    5    86.496 ±  1.913  ops/s
[info] execution.NestedZQueryBenchmark.multifieldSequentialQuery10000  thrpt    5    10.685 ±  0.328  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery100           thrpt    5   328.706 ± 14.948  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery1000          thrpt    5   181.841 ±  4.186  ops/s
[info] execution.NestedZQueryBenchmark.simpleBatchedQuery10000         thrpt    5    26.440 ±  0.953  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery100          thrpt    5   243.525 ±  8.455  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery1000         thrpt    5    61.012 ±  2.067  ops/s
[info] execution.NestedZQueryBenchmark.simpleParallelQuery10000        thrpt    5     6.909 ±  0.080  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery100        thrpt    5   330.774 ± 16.041  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery1000       thrpt    5   179.764 ±  2.028  ops/s
[info] execution.NestedZQueryBenchmark.simpleSequentialQuery10000      thrpt    5    27.805 ±  0.956  ops/s

The performance does not change so much.

Last night benchmark result can not reach today with same code, don't know the reason.

@timzaak
Copy link
Contributor Author

timzaak commented Apr 9, 2021

@ghostdogpr It may be the final version. I don't know if there have other ways to improve performance

@ghostdogpr
Copy link
Owner

I'll dig into cats-parse and see if the profiler shows something obvious to improve, maybe this weekend or the next.

@ghostdogpr
Copy link
Owner

@timzaak I did some profiling of your code. I found 2 things:

  • Changing all the def into val and lazy val improves the performance by x3. To make it even better it would be better to remove the description: Option[String] (in *TypeDefinition parsers) and parse the description inside those functions so that they could be val as well.
  • Apart from that, most time seems to be spent generating hashcodes. I opened CPU time spent in hashcode computation typelevel/cats-parse#198 to see if it's something cats-parse can avoid or if comes from our usage of it. It looks like if we could solve that one, performance should be pretty good!

Btw I will review the correctness more in details once the performance issue is solved.

@timzaak
Copy link
Contributor Author

timzaak commented Apr 10, 2021

@ghostdogpr thanks for your profile. I will watch typelevel/cats-parse#198

@timzaak
Copy link
Contributor Author

timzaak commented Apr 14, 2021

wait until cats-parse 0.3.3 release

@ghostdogpr
Copy link
Owner

@timzaak can you rebase your PR into this branch? #847

This is the branch I created for Scala 3 support. The code that is specific to Scala 2 is in the scala-2 folder while the code specific to Scala 3 is in the scala-3 folder. I already moved Fastparse to the scala-2 side, so you can add your part in the scala-3 one, and they will both coexist. The downside of that is that we won't be able to run tests yet because the derivation is not working, but since tests are green now it should be fine.

@timzaak
Copy link
Contributor Author

timzaak commented Apr 19, 2021

@ghostdogpr I will do it later

@timzaak timzaak closed this Apr 19, 2021
@timzaak timzaak mentioned this pull request Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants