Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing the dependency on Microsoft.IO.RecyclableStream and improving serializer performance #349

Merged

Conversation

stebet
Copy link
Contributor

@stebet stebet commented Dec 20, 2024

Main changes:

  • Remove dependency on Microsoft.IO.RecyclableMemoryStream and replaces that with implementations based on IBufferWriter<byte> and ArrayPool<byte>
  • Use AggressiveInlining where appropriate
  • Use better optimized methods when serializers support them (like using IBufferWriter<byte> instances where applicable etc.)
  • (De)SerializeAsync methods just call the non-async versions since this never does any I/O so async becomes pure overhead.
  • Use Stream/Array pooling where appropriate

Runner Information


BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.2605)
Unknown processor
.NET SDK 9.0.101
 [Host] : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Toolchain=InProcessEmitToolchain 

CysharpMemoryPack

Method Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Serialize - Before 105.56 μs 2.094 μs 2.936 μs 109.40 μs 30.27 30.27 30.27 94.79 KB
Serialize - After 104.10 μs 2.077 μs 4.602 μs 109.93 μs 30.27 30.27 30.27 94.93 KB
Deserialize - Before 59.60 μs 1.184 μs 3.377 μs 63.59 μs 21.67 7.02 - 265.68 KB
Deserialize - After 53.55 μs 1.064 μs 2.590 μs 57.84 μs 21.67 7.02 - 265.68 KB
SerializeAsync - Before 96.91 μs 0.188 μs 0.166 μs 97.14 μs 30.27 30.27 30.27 94.79 KB
SerializeAsync - After 92.67 μs 1.757 μs 1.644 μs 93.85 μs 30.27 30.27 30.27 94.93 KB
DeserializeAsync - Before 52.47 μs 1.024 μs 1.219 μs 53.37 μs 21.67 7.02 - 265.68 KB
DeserializeAsync - After 50.08 μs 0.178 μs 0.139 μs 50.26 μs 21.67 7.02 - 265.68 KB

NeueccMessagePack

Method Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Serialize - Before 134.03 μs 2.584 μs 3.449 μs 141.49 μs 5.62 - - 71.59 KB
Serialize - After 130.03 μs 2.558 μs 3.829 μs 134.84 μs 5.62 0.73 - 71.62 KB
Deserialize - Before 228.95 μs 3.776 μs 3.347 μs 231.19 μs 21.48 6.84 - 265.68 KB
Deserialize - After 210.79 μs 4.098 μs 4.208 μs 213.72 μs 21.48 6.84 - 265.68 KB
SerializeAsync - Before 129.47 μs 1.940 μs 1.906 μs 132.96 μs 5.62 - - 71.38 KB
SerializeAsync - After 132.82 μs 2.454 μs 2.296 μs 134.70 μs 5.62 0.98 - 71.41 KB
DeserializeAsync - Before 212.83 μs 3.359 μs 4.248 μs 223.17 μs 21.48 6.84 - 265.68 KB
DeserializeAsync - After 220.15 μs 4.187 μs 6.394 μs 228.99 μs 21.48 6.84 - 265.68 KB

NewtonsoftJson

Method Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Serialize - Before 1,196.11 μs 8.743 μs 8.178 μs 1,206.85 μs 132.81 132.81 132.82 1058.87 KB
Serialize - After 982.74 μs 8.041 μs 6.715 μs 991.35 μs 44.92 44.92 44.92 472.39 KB
Deserialize - Before 1,714.11 μs 17.194 μs 14.358 μs 1,730.95 μs 89.84 89.84 89.84 956.01 KB
Deserialize - After 1,565.65 μs 30.581 μs 36.404 μs 1,593.10 μs 52.73 17.58 - 664.50 KB
SerializeAsync - Before 1,170.13 μs 15.676 μs 14.663 μs 1,194.89 μs 132.81 132.82 132.81 1058.84 KB
SerializeAsync - After 1,098.00 μs 21.695 μs 33.776 μs 1,150.39 μs 44.92 44.92 44.92 472.36 KB
DeserializeAsync - Before 1,873.96 μs 27.709 μs 25.919 μs 1,889.38 μs 89.84 89.84 89.84 956.37 KB
DeserializeAsync - After 1,503.74 μs 28.815 μs 29.591 μs 1,546.75 μs 52.73 17.58 - 664.73 KB

ProtoBufNet

Method Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Serialize - Before 272.91 μs 5.197 μs 5.104 μs 283.40 μs 5.86 - - 78.15 KB
Serialize - After 257.61 μs 1.529 μs 1.193 μs 258.70 μs 5.86 - - 77.94 KB
Deserialize - Before 415.35 μs 2.054 μs 1.604 μs 417.54 μs 21.48 6.84 - 265.99 KB
Deserialize - After 374.51 μs 4.100 μs 3.201 μs 379.65 μs 21.48 6.84 - 265.71 KB
SerializeAsync - Before 261.92 μs 3.084 μs 3.300 μs 268.50 μs 5.86 - - 78.13 KB
SerializeAsync - After 264.19 μs 1.083 μs 0.960 μs 265.26 μs 5.86 - - 77.95 KB
DeserializeAsync - Before 391.25 μs 6.817 μs 6.043 μs 400.76 μs 21.48 6.84 - 265.99 KB
DeserializeAsync - After 371.55 μs 1.868 μs 1.458 μs 373.18 μs 21.48 6.84 - 265.71 KB

ServiceStackJson

Method Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Serialize - Before 951.66 μs 8.324 μs 6.499 μs 960.35 μs 93.75 46.88 46.88 918.06 KB
Serialize - After 1,110.08 μs 21.414 μs 22.913 μs 1,129.70 μs 93.75 46.88 46.88 919.08 KB
Deserialize - Before 3,155.60 μs 16.833 μs 14.922 μs 3,172.18 μs 39.06 11.72 - 517.84 KB
Deserialize - After 3,072.51 μs 60.461 μs 82.760 μs 3,198.14 μs 39.06 11.72 - 516.22 KB
SerializeAsync - Before 1,261.15 μs 23.705 μs 21.014 μs 1,286.96 μs 103.52 41.02 41.02 918.45 KB
SerializeAsync - After 1,039.95 μs 5.062 μs 4.735 μs 1,045.45 μs 93.75 46.88 46.88 915.11 KB
DeserializeAsync - Before 3,021.69 μs 13.353 μs 11.837 μs 3,039.19 μs 39.06 11.72 - 517.84 KB
DeserializeAsync - After 3,378.36 μs 43.456 μs 40.649 μs 3,398.41 μs 39.06 11.72 - 516.22 KB

SystemTextJson

Method Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Serialize - Before 401.76 μs 1.727 μs 1.615 μs 403.71 μs 41.02 41.02 41.02 146.87 KB
Serialize - After 391.87 μs 3.665 μs 3.428 μs 396.84 μs 39.55 39.55 39.55 146.90 KB
Deserialize - Before 793.84 μs 6.847 μs 5.717 μs 799.79 μs 31.25 9.77 - 391.67 KB
Deserialize - After 782.21 μs 14.931 μs 14.665 μs 801.80 μs 31.25 9.77 - 391.67 KB
SerializeAsync - Before 377.48 μs 0.936 μs 0.830 μs 378.88 μs 42.48 42.48 42.48 147.40 KB
SerializeAsync - After 352.10 μs 0.950 μs 0.889 μs 353.30 μs 39.55 39.55 39.55 146.89 KB
DeserializeAsync - Before 1,117.59 μs 5.602 μs 4.966 μs 1,121.62 μs 31.25 9.77 - 391.98 KB
DeserializeAsync - After 823.95 μs 16.018 μs 22.455 μs 843.89 μs 31.25 9.77 - 391.67 KB

@jodydonetti jodydonetti self-assigned this Dec 20, 2024
@jodydonetti jodydonetti added the enhancement New feature or request label Dec 20, 2024
@jodydonetti jodydonetti added this to the v2.0.0 milestone Dec 20, 2024
@jodydonetti
Copy link
Collaborator

Hi @stebet , this looks glorious 😍

Will carefully look into it this evening or tomorrow and will ping back.

Thanks!

@stebet
Copy link
Contributor Author

stebet commented Dec 20, 2024

Hi @stebet , this looks glorious 😍

Will carefully look into it this evening or tomorrow and will ping back.

Thanks!

I'm looking at a few test issues, I'll let you know as soon as I'm done figuring them out :)

stebet and others added 3 commits December 20, 2024 15:54
Co-authored-by: Jody Donetti <[email protected]>
Co-authored-by: Stefán Jökull Sigurðarson <[email protected]>
@jodydonetti
Copy link
Collaborator

jodydonetti commented Dec 21, 2024

Hi @stebet , I run the benchmarks on my machine, and the results seem to be consistent.

Here are the main results.

CysharpMemoryPack (Serialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 97.09 us 1.927 us 1.892 us 99.62 us 30.2734 30.2734 30.2734 94.79 KB
After 67.01 us 0.638 us 0.597 us 67.79 us 30.2734 30.2734 30.2734 94.93 KB
After+Shared 109.27 us 4.862 us 14.337 us 119.33 us 30.2734 30.2734 30.2734 94.93 KB

CysharpMemoryPack (Deserialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 49.72 us 0.965 us 1.073 us 51.49 us 21.6675 7.0190 - 265.68 KB
After 34.43 us 0.411 us 0.385 us 34.94 us 21.6675 7.0190 - 265.68 KB
After+Shared 34.51 us 0.588 us 0.550 us 35.25 us 21.6675 7.0190 - 265.68 KB

NeueccMessagePack (Serialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 122.96 us 1.921 us 1.703 us 125.93 us 5.6152 - - 71.59 KB
After 78.74 us 1.396 us 1.306 us 80.67 us 5.7373 - - 71.62 KB
After+Shared 81.04 us 0.867 us 0.811 us 82.07 us 5.7373 - - 71.62 KB

NeueccMessagePack (Deserialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 189.92 us 3.218 us 3.010 us 194.60 us 21.4844 6.8359 - 265.68 KB
After 150.30 us 1.558 us 1.381 us 152.66 us 21.4844 6.8359 - 265.68 KB
After+Shared 149.99 us 1.415 us 1.255 us 151.69 us 21.4844 6.8359 - 265.68 KB

SystemTextJson (Serialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 383.09 us 5.302 us 4.959 us 390.19 us 45.4102 45.4102 45.4102 146.61 KB
Before+Recyclable 331.87 us 3.728 us 3.305 us 336.54 us 40.0391 40.0391 40.0391 146.85 KB
After 250.10 us 4.506 us 4.215 us 257.07 us 41.5039 41.5039 41.5039 146.91 KB
After+Shared 288.82 us 1.824 us 1.523 us 290.86 us 45.4102 45.4102 45.4102 146.61 KB

SystemTextJson (Deserialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 644.87 us 12.249 us 11.457 us 662.22 us 31.2500 9.7656 - 391.67 KB
Before+Recyclable 634.01 us 2.830 us 2.363 us 638.22 us 31.2500 9.7656 - 391.67 KB
After 511.61 us 3.509 us 3.283 us 516.57 us 31.2500 9.7656 - 391.67 KB
After+Shared 530.56 us 6.669 us 6.238 us 536.86 us 31.2500 9.7656 - 391.67 KB

As you can see for System.Text.Json I also added the version with the RecyclableMemoryStream.

I also added a version that used ArrayPool<byte>.Shared instead of ArrayPool<byte>.Create() for a (theoretically) higher memory reuse, but as you can see it looks like not the case, and actually seems even slightly worse.

I'm thinking that we may see advantages in using the Shared in real applications with a lot of allocations (so, more use of the shared pool), but I'm not sure.

I'd say to just go with the Create() one based on these results.

Thanks for this PR!

@jodydonetti
Copy link
Collaborator

jodydonetti commented Dec 21, 2024

Also, I need to decide what to do with the RecyclableMemoryStream dependency.

On one hand I can just remove it and everything will be slimmer, but then I'd have to also remove the ctor overload that had a param of such type.
Usually I would mark the ctor as [Obsolete] and just forward the call to the (now automatically optimized) other ctor, but to keep it I need to keep the package ref, otherwise the type itself is not there anymore and it won't compile of course

On the other hand we are talking about V2, which is a major version bump already with some breaking changes, so there's that.

In the end what I don't like much is the developer experience of upgrading and having compile-time errors without any specific error message.
Mmmh, doubts, thoughts.

Pinging @viniciusvarzea whom proposed the RecyclableMemoryStream some versions ago, I'd like to see what they think of it.

Any suggestion?

@jodydonetti jodydonetti merged commit c6e7c1c into ZiggyCreatures:release/v2_0_0 Dec 22, 2024
1 check failed
@jodydonetti
Copy link
Collaborator

Hi @stebet , I just release preview-4, which most probably will be the last preview before going GA with FusionCache V2 🥳

If you can play with it and let me know it would be great, thanks!

@sabbadino
Copy link

Hi @stebet , I run the benchmarks on my machine, and the results seem to be consistent.

Here are the main results.

CysharpMemoryPack (Serialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 97.09 us 1.927 us 1.892 us 99.62 us 30.2734 30.2734 30.2734 94.79 KB
After 67.01 us 0.638 us 0.597 us 67.79 us 30.2734 30.2734 30.2734 94.93 KB
After+Shared 109.27 us 4.862 us 14.337 us 119.33 us 30.2734 30.2734 30.2734 94.93 KB

CysharpMemoryPack (Deserialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 49.72 us 0.965 us 1.073 us 51.49 us 21.6675 7.0190 - 265.68 KB
After 34.43 us 0.411 us 0.385 us 34.94 us 21.6675 7.0190 - 265.68 KB
After+Shared 34.51 us 0.588 us 0.550 us 35.25 us 21.6675 7.0190 - 265.68 KB

NeueccMessagePack (Serialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 122.96 us 1.921 us 1.703 us 125.93 us 5.6152 - - 71.59 KB
After 78.74 us 1.396 us 1.306 us 80.67 us 5.7373 - - 71.62 KB
After+Shared 81.04 us 0.867 us 0.811 us 82.07 us 5.7373 - - 71.62 KB

NeueccMessagePack (Deserialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 189.92 us 3.218 us 3.010 us 194.60 us 21.4844 6.8359 - 265.68 KB
After 150.30 us 1.558 us 1.381 us 152.66 us 21.4844 6.8359 - 265.68 KB
After+Shared 149.99 us 1.415 us 1.255 us 151.69 us 21.4844 6.8359 - 265.68 KB

SystemTextJson (Serialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 383.09 us 5.302 us 4.959 us 390.19 us 45.4102 45.4102 45.4102 146.61 KB
Before+Recyclable 331.87 us 3.728 us 3.305 us 336.54 us 40.0391 40.0391 40.0391 146.85 KB
After 250.10 us 4.506 us 4.215 us 257.07 us 41.5039 41.5039 41.5039 146.91 KB
After+Shared 288.82 us 1.824 us 1.523 us 290.86 us 45.4102 45.4102 45.4102 146.61 KB

SystemTextJson (Deserialize)

What Mean Error StdDev P95 Gen0 Gen1 Gen2 Allocated
Before 644.87 us 12.249 us 11.457 us 662.22 us 31.2500 9.7656 - 391.67 KB
Before+Recyclable 634.01 us 2.830 us 2.363 us 638.22 us 31.2500 9.7656 - 391.67 KB
After 511.61 us 3.509 us 3.283 us 516.57 us 31.2500 9.7656 - 391.67 KB
After+Shared 530.56 us 6.669 us 6.238 us 536.86 us 31.2500 9.7656 - 391.67 KB
As you can see for System.Text.Json I also added the version with the RecyclableMemoryStream.

I also added a version that used ArrayPool<byte>.Shared instead of ArrayPool<byte>.Create() for a (theoretically) higher memory reuse, but as you can see it looks like not the case, and actually seems even slightly worse.

I'm thinking that we may see advantages in using the Shared in real applications with a lot of allocations (so, more use of the shared pool), but I'm not sure.

I'd say to just go with the Create() one based on these results.

Thanks for this PR!

From this link: https://medium.com/@epeshk/the-big-performance-difference-between-arraypools-in-net-b25c9fc5e31d

"..... So, even staying within the standard pool implementations, for small arrays needed for a short time it is preferable to use the scalable ArrayPool.Shared, and for large arrays a pool created through ArrayPool.Create(..., ...) as more roomy and economical in terms of no separation by threads."

@jodydonetti
Copy link
Collaborator

jodydonetti commented Jan 5, 2025

Hi @sabbadino

From this link: https://medium.com/@epeshk/the-big-performance-difference-between-arraypools-in-net-b25c9fc5e31d

"..... So, even staying within the standard pool implementations, for small arrays needed for a short time it is preferable to use the scalable ArrayPool.Shared, and for large arrays a pool created through ArrayPool.Create(..., ...) as more roomy and economical in terms of no separation by threads."

Thanks for the link, that's interesting!
Something I'm thinking about adding is the ability to pass a specific array pool to be used in the ctor's options, and if nothing is passed fall back to the shared one.

I'll play with this approach and see how it goes.

jodydonetti added a commit that referenced this pull request Jan 19, 2025
* Add support for raw clear of inner memory cache
* Minor NRT stuff
* Use collection expressions
* Make tagging fully working
* Better tagging params for Set method
* Refactoring and cleanup
* Add sync support for tagging
* Add support for GetOrSet without factory + refactoring + better inline comments
* Stop casting to FusionCacheBuilder and upgrade IFusionCacheBuilder instead.
* Fix with eager refresh (sync version)
* Add entry options to RemoveByTag() and Clear()
* Add tags to logging
* Package update
* Add new IncludeTagsInLogs option to... well, I mean...
* Add specific skip memory/distributed read/write options
* Add support for specific option to skip memory/distributed read/write + better tags entry options support
* Better detection + logging when a new Clear() timestamp is detected
* Stop suppressing serialization exceptions. Deserialization exceptions instead will keep being suppressable.
* Better default value for TagsMemoryCacheDurationOverride option
* Better handling of background factory soft fail (eg: ctx.Fail())
* Better tests
* More stable Expire + renamed RemoveByTag[Async] to ExpireByTag[Async] to be more clear about the outcome
* Add error = true to all [Obsolete] usage
* Add tagging support to observability
* Change wire format version
* Change DataMember names (saves space)
* Better Clear (both expire and remove) + added DisableTagging option
* Benchmarks stuff
* Perf boost
* Fix for tagging with eager refresh + tests
* Add support for Microsoft HybridCache + tests
* Ensure the same builder always returns the same instance + tests
* Add immutable types support for AutoClone + add SkipAutoCloneForImmutableObjects option + tests
* Better tests
* FusionHybridCache tests
* Comments
* Xml comments
* Change Dependency to Microsoft.Extensions.Caching.Abstractions (#341)
* Adjust HybridCache dependency (#344)
* Fix typo
* Update various Microsoft.Extensions.Caching deps to v9.0.0
* Merging Serializer Benchmarks improvements to the v2 branch (#347)
* Improving the Serializer benchmarks (#343)
* Improving the Serializer benchmark by adding all the serializers and cleaning up the code and config for it. Also added tests for serializing arrays to check for memory pressure on buffers.
* Fixing Job config
* Minor usings cleanup
* Removing the dependency on Microsoft.IO.RecyclableStream and improving serializer performance (#349)
* Removing the Microsoft.IO.RecyclableMemory dependency and using ArrayPools instead
* Minor cleanup
* Cleanup
* Add MissingCacheKeyPrefixWarningLogLevel option
* More option duplication tests
* Add AllowStaleOnReadOnly entry option
* Add DangerZone stuff
* Benchmarks stuff
* Remove reactors for good (replace from a long time by memory lockers)
* Better metadata: move LogicalExpiration from metadata to entry and make it a timestamp (long)
* Better metadata: switch LastModified from DateTimeOffset to timestamp (long utc ticks)
* Better metadata: add Priority
* Better metadata priority handling
* Organize tests better
* Priority tests
* Faster tests
* Ensure correct timestamp in backplane messages
* Make the cross-node Clear timestamp directly precise
* Skip distributed cache write and backplane when Tagging expiration timestamp is 0 (zero)
* Align sync/async executeCascadeAction usage
* Better serialization tests
* Better byte array tests
* Better metadata surrogate (protobuf-net) member order
* Make buffer stuff sealed (small perf boost) + xml docs
* Better errors/exceptions for traces (observability)
* Better Expire on memory cache
* Better TagsDefaultEntryOptions
* Minor tag expiration changes
* Fix for extra closing curly brace
* Better status updates for traces
* Minor perf boost
* Better tagging
* Add support for RemoveByTag with multiple tags
* Cross the 1000 tests mark!
* Unresolved tension https://xkcd.com/859/
* Minor internal stuff (ToLogString)
* Docs
* FusionCache v2. Yes, really.

---------

Co-authored-by: Jody Donetti <[email protected]>
Co-authored-by: Stefán Jökull Sigurðarson <[email protected]>
Co-authored-by: Paul Welter <[email protected]>
Co-authored-by: Brian Dukes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants