Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deriving performance, take two #153

Closed
tfausak opened this issue May 17, 2020 · 5 comments
Closed

Deriving performance, take two #153

tfausak opened this issue May 17, 2020 · 5 comments
Labels

Comments

@tfausak
Copy link
Owner

tfausak commented May 17, 2020

I have written about the performance of deriving various type classes in Haskell before:

Reflecting on that, I think I did some things that weren't really worth it:

  • I benchmarked too many versions of GHC. I mean this both in terms of major/minor versions (7.0.4 through 8.2.1) and patch versions (7.0.1, 7.0.2, 7.0.3, and 7.0.4). Just doing the latest major/minor is probably fine (8.10.1 at time of writing) and there's no reason to benchmark out of date patch versions (like 8.8.1 or 8.8.2).

  • I focused too much on type classes from base that were available in all the tested versions of GHC. In particular I didn't benchmark Generic. I think people are probably more interested in type classes they're likely to find in the wild, like FromJSON and ToJSON from aeson.

  • I didn't compare derived performance to hand-written performance. At the time writing a code generator seemed too difficult, but in retrospect I don't know why I thought that. Also this is what people probably want to know: Is generating the code slow, or is compiling the generated code slow?

  • Related to the above, I didn't look at generic deriving versus Template Haskell. (There are other derivation options like newtype, but I don't think that's worth investigating.)

  • I didn't look at various optimization levels. People normally do development builds with -O0 and production builds with -O1 or -O2. It would be interesting to see how performance compares between them.

  • I didn't look at splitting modules. In particular I put everything into one huge module, rather than one module per type (which I think is more common).

  • Related to the above, I didn't look at using multiple jobs. Most machines have multiple cores. Compiling with -jN for various Ns could have a big impact on timing.

  • I didn't actually answer this question: Is it something related to type classes that's slow, or would the equivalent functions be slow too? For example instead of providing an Eq instance, what if I wrote eq :: SomeType -> SomeType -> Bool?

  • Big picture, I didn't really provide a takeaway. What do I think people should do with this information? I've put all this effort into benchmarking, I should provide some suggestions.

I'm thinking about all this now because at $WORK we typically write JSON instances by hand, but we're considering changing that. I'm curious how it will affect compile times.

To that end, I've been working on a new benchmark for this. It generates code in various configurations and then benchmarks how long it takes to compile. I'm trying to focus on GHC 8.10.1 and JSON. The things I'm comparing are manual-vs-generic-vs-template instances, single-vs-multiple modules, optimization levels, and parallelism.

https://gist.github.com/tfausak/a5cae9e41e5ccd0b0a5b4e49f1e2104d

@tfausak tfausak added the idea label May 17, 2020
@tfausak
Copy link
Owner Author

tfausak commented May 17, 2020

Some hare-brained ideas that might be fun to chase down:

  • Is it faster to do something like this? instance C T where m = f; f = .... That is, write the method as a top level function, then simply alias the method to it.

  • Does blank space play a role at all in performance? I doubt it, since running with -fno-code is super fast, but who knows. Maybe using curly braces and semicolons is faster than layout.

  • Are operators slower than prefix functions? Instead of x <> y, what about (<>) x y?

  • For equivalent expressions, are chained operators better than lists? For example x <> y <> z or mconcat [x, y, z]?

  • This one might be harder to test, but what about orphan instances? For some type T, you could put the type in T.Type, and (say) its FromJSON instance in T.FromJSON, then reexport both from the module T.

    • Even if this sped things up, it would be amazingly annoying to work with.

I don't think any of these will have an impact, but they should be easy enough to test with the code generator.

@tfausak
Copy link
Owner Author

tfausak commented May 18, 2020

I meant to include a link to this excellent post: https://www.parsonsmatt.org/2019/11/27/keeping_compilation_fast.html

Looking at the performance of -j1, -j2, -j4, and -j8 for this workload on my desktop (Ryzen 5 2400G - 4 cores / 8 threads), -j4 is by far the best when there are multiple files. -j8 is sometimes a tiny bit faster, sometimes slower. -j2 has about 75% of the runtime as -j1, and -j4 about 50%.

For the cases where everything is in a single file, everything performs about the same. -j1 through -j8: sometimes faster, sometimes slower. Using +RTS -Nx -RTS doesn't seem to speed things up either.

Potential takeaway: Set -j to the number of cores your CPU has.

@tfausak
Copy link
Owner Author

tfausak commented May 18, 2020

Regardless of the implementation (manual, generic, TH) and optimization (0, 1, 2), multiple files are slower with -j1 and faster in all other cases.

That alone would be a compelling reason to prefer multiple files. But you also have to consider recompilation when you edit a file, in which case multiple files easily wins. The single file obviously has no choice but to recompile everything. With multiple files only the one that changed and (maybe) ones that depend on that have to be recompiled.

@tfausak
Copy link
Owner Author

tfausak commented May 18, 2020

In terms of different implementation options, manual is faster than TH, which is faster than generics. Using -O1 -j4 with multiple files, you've got:

  • manual: 17 seconds
  • TH: 21 seconds (+24%)
  • generics: 25 seconds (+47%)

Across different configurations, the ratios change but the idea stays the same. Generics are slow. Template Haskell is faster. Writing instances by hand is the fastest.

That being said, two caveats:

  • TH forces recompilation whenever something above it changes, even if it shouldn't. For example if you touch Frame.hs, the Frame, Content, and Replay modules will be recompiled. By comparison, which manual and generic instances, only Frame is recompiled.

  • For ToJSON, manual instances implement both toJSON and toEncoding. Together with FromJSON/parseJSON, this is a lot of times that you have to write the same code and make sure it all agrees.

I'm not sure what the takeaway is. Even though generics are slow, maybe they're the least bad? Or maybe it would be nice to have a GHC source plugin that can write instances for you, similar to TH but without forcing recompilation.

@tfausak
Copy link
Owner Author

tfausak commented May 20, 2020

@tfausak tfausak closed this as completed May 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant