Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising accessor performance for ARec #150

Open
Philonous opened this issue May 9, 2021 · 5 comments
Open

Surprising accessor performance for ARec #150

Philonous opened this issue May 9, 2021 · 5 comments

Comments

@Philonous
Copy link
Contributor

Philonous commented May 9, 2021

The accessors benchmark suggests that ARec field access should be only slightly worse than Rec field access for fields in the front and stay constant for fields with higher indices while Rec fields become increasingly more expensive to access. For example, on my machine, access to Rec takes 6.1ns for index 0 up to 12.4ns for index 15, for an average of ~9.2ns. On the other hand, access time ARec fields is constant with ~7.5 ns. (See included figure)

This would suggest that for access patterns that exercise all fields equally, ARec should outperform Rec. However, I added a new benchmark simulating random read access where I retrieve all elements once and calculate their sum: Link to gist

Here, ARec fares significantly worse than Rec with 68 vs 55 ns!

image

I find this discrepancy surprising. Is there perhaps a problem in the way the benchmark is written?

@acowley
Copy link
Contributor

acowley commented May 9, 2021

Wow that is surprising! I’ll take a look as soon as I get a chance.

@Philonous
Copy link
Contributor Author

Philonous commented May 10, 2021

The core that GHC produces for the Rec case looks like this:

$wsumRec
  = \ (w :: Rec ElField Fields) ->
      case w of { :& @ r1 @ rs1 co1 x1 xs ->
      case x1 `cast` <Co:4> of { Field @ s @ t co $dKnownSymbol ds ->
      case ds `cast` <Co:4> of { I# x ->
      case xs `cast` <Co:8> of { :& @ r2 @ rs2 co2 x2 xs1 ->
      case x2 `cast` <Co:4> of
      { Field @ s1 @ t1 co3 $dKnownSymbol1 ds1 ->
      case ds1 `cast` <Co:4> of { I# y ->
      case xs1 `cast` <Co:8> of { :& @ r3 @ rs3 co4 x3 xs2 ->
      case x3 `cast` <Co:4> of
      { Field @ s2 @ t2 co5 $dKnownSymbol2 ds2 ->
      case ds2 `cast` <Co:4> of { I# y1 ->
      case xs2 `cast` <Co:8> of { :& @ r4 @ rs4 co6 x4 xs3 ->
      case x4 `cast` <Co:4> of
[...]

Full gist

So GHC merges multiple pattern matches, making the access pattern linear rather than quadratic. Quite a clever optimization. I'm still surprised that ARec fares so badly.

@acowley
Copy link
Contributor

acowley commented May 10, 2021

Yes, the instance unrolling and inlining was an original key feature of the performance story. I don’t recall the exact motivations of the ARec variant, but I imagine it was some combination of:

  • If a vinyl accessor isn’t totally inlined then performance is poor. The inlining is a big compile-time cost, so we try to have different ways of balancing things.
  • At times we’ve limited the unrolling to some maximum depth, and for very wide records a user might want to avoid it as the type checking is I think exponential. ARec should work for very wide records without as much type checker work, but we don’t have a compilation time test to actually verify that.

All that said, I’m surprised by your result, too. In particular how much worse the times are than the existing benchmark. I’m looking forward to digging in to it.

@Philonous
Copy link
Contributor Author

Ah, yes, I was aware of the instance optimizations, the one that surprised me was that GHC re-uses pattern matches, so instead of repeatedly matching on the outermost :& it effectively turned my function into a fold, which I thought was clever.

The reason I'm mentioning is that it means that my function doesn't actually measure the "average case" for accessing random fields; instead, it just pattern matches all the way to the last field and uses all the values it finds on the way.

That's great news if we have an access pattern where we want to look at multiple fields, we effectively only have to pay for the deepest one. But it invalidates my benchmark.

Philonous added a commit to Philonous/Vinyl that referenced this issue May 16, 2021
Philonous added a commit to Philonous/Vinyl that referenced this issue May 16, 2021
@Philonous
Copy link
Contributor Author

Philonous commented May 16, 2021

I've tried turning ElField into a newtype, which helped performance a lot:

(All values in ns)

  newtype ADT GADT
creating/vinyl record 13.82 13.97 25.55
creating/Old style ARec with toARec 137.9 142.9 152.5
creating/Old style ARec with toARecFast 53.38 53.68 67.87
creating/New style ARec with toARec 66.1 61.83 77.83
creating/New style ARec with toARecFast 15.38 15.4 28.97
       
sums/haskell record 14 14 14
sums/vinyl Rec 45.09 53.9 53.4
sums/vinyl ARec 39.04 71.87 68.53
sums/vinyl ARec Old 38.85 69.07 70.23
       
Haskell Record      
#0 5.564 5.178 5.178
#4 5.615 5.127 5.202
#8 5.615 5.18 5.13
#15 5.802 5.175 5.4
Rec      
#0 5.79 5.85 5.886
#4 7.035 7.06 6.835
#8 8.398 8.181 8.401
#15 11.85 12.05 11.7
Arec Old      
#0 7.534 8.913 8.82
#4 7.548 8.91 8.797
#8 7.222 8.838 8.726
#15 7.185 9.437 8.92
Arec New      
#0 7.248 9.609 8.585
#4 6.989 9.608 8.714
#8 7.267 9.633 8.33
#15 6.917 9.437 8.634

The three variants look like this:

Newtype

-- Morally: newtype ElField (s, t) = Field t
-- But GHC doesn't allow that
newtype ElField (t :: (Symbol, *)) = Field (Snd t)

ADT

data ElField (field :: (Symbol, *)) where
  Field :: !t -> ElField '(s,t)

GADT

data ElField (field :: (Symbol, *)) where
  Field :: KnownSymbol s => !t -> ElField '(s,t)

These are the only changes I made to the code.

It looks like removing the KnownSymbol gives us most of the benefit for creating records whereas going from data to newtype saves the most during lookup (presumably because we don't need to pattern match on Field). Surprisingly it seems to help ARec more than Rec, turning the performance around for the sum-benchmark and shifting the amortization point from around 8th field to around 4th.

Philonous added a commit to Philonous/Vinyl that referenced this issue May 19, 2021
* Move from Array to SmallArray# to avoid intermediate list during construction
* Use class instead of recursive function in toARec for improved inlining
* Add (&:), arnil and arec presudo-constructors
Philonous added a commit to Philonous/Vinyl that referenced this issue May 19, 2021
* Saves a 1 word per field
* Improves accessor performance significantly both for Rec as well as ARec
Philonous added a commit to Philonous/Vinyl that referenced this issue May 19, 2021
* Move from Array to SmallArray# to avoid intermediate list during construction
* Use class instead of recursive function in toARec for improved inlining
* Add (&:), arnil and arec presudo-constructors
Philonous added a commit to Philonous/Vinyl that referenced this issue May 19, 2021
* Saves a 1 word per field
* Improves accessor performance significantly both for Rec as well as ARec
Philonous added a commit to Philonous/Vinyl that referenced this issue May 19, 2021
* Move from Array to SmallArray# to avoid intermediate list during construction
* Use class instead of recursive function in toARec for improved inlining
* Add (&:), arnil and arec presudo-constructors
Philonous added a commit to Philonous/Vinyl that referenced this issue May 19, 2021
* Saves a 1 word per field
* Improves accessor performance significantly both for Rec as well as ARec
Philonous added a commit to Philonous/Vinyl that referenced this issue May 26, 2021
* Move from Array to SmallArray# to avoid intermediate list during construction
* Use class instead of recursive function in toARec for improved inlining
Philonous added a commit to Philonous/Vinyl that referenced this issue May 26, 2021
* Saves a 1 word per field
* Improves accessor performance significantly both for Rec as well as ARec
Philonous added a commit to Philonous/Vinyl that referenced this issue May 26, 2021
* Move from Array to SmallArray# to avoid intermediate list during construction
* Use class instead of recursive function in toARec for improved inlining
Philonous added a commit to Philonous/Vinyl that referenced this issue May 26, 2021
* Saves a 1 word per field
* Improves accessor performance significantly both for Rec as well as ARec
Philonous added a commit to Philonous/Vinyl that referenced this issue May 27, 2021
* Move from Array to SmallArray# to avoid intermediate list during construction
* Use class instead of recursive function in toARec for improved inlining
Philonous added a commit to Philonous/Vinyl that referenced this issue May 27, 2021
* Saves a 1 word per field
* Improves accessor performance significantly both for Rec as well as ARec
Philonous added a commit to Philonous/Vinyl that referenced this issue May 27, 2021
* Move from Array to SmallArray# to avoid intermediate list during construction
* Use class instead of recursive function in toARec for improved inlining
Philonous added a commit to Philonous/Vinyl that referenced this issue May 27, 2021
* Saves a 1 word per field
* Improves accessor performance significantly both for Rec as well as ARec
acowley pushed a commit that referenced this issue Jun 21, 2021
* Move from Array to SmallArray# to avoid intermediate list during construction
* Use class instead of recursive function in toARec for improved inlining
acowley pushed a commit that referenced this issue Jun 21, 2021
* Saves a 1 word per field
* Improves accessor performance significantly both for Rec as well as ARec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants