Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the allocation and speed of ActivitySet/GetCustomProperty #41840

Merged

Conversation

tarekgh
Copy link
Member

@tarekgh tarekgh commented Sep 3, 2020

Fixes #39591

It has been reported the allocation when using Activity.SetCustomProperty is too much. This change enhance the allocation and also show enhancement in the speed perf too.

Before Change

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Using1Property 278.0 ns 9.67 ns 28.52 ns 0.2160 - - 904 B
Using3Property 502.1 ns 16.27 ns 47.72 ns 0.2384 - - 1000 B
Using5Property 749.5 ns 33.28 ns 97.07 ns 0.2613 - - 1096 B
Using10Property 1,235.8 ns 46.23 ns 133.37 ns 0.3185 - - 1336 B
Using20Property 3,116.9 ns 92.66 ns 271.76 ns 0.8545 - - 3576 B

After Change

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Using1Property 182.6 ns 4.79 ns 14.04 ns 0.0937 - - 392 B
Using3Property 356.9 ns 10.38 ns 30.12 ns 0.0935 - - 392 B
Using5Property 627.5 ns 17.66 ns 51.23 ns 0.1526 - - 640 B
Using10Property 1,258.8 ns 43.85 ns 127.21 ns 0.2785 - - 1168 B
Using20Property 2,419.1 ns 78.58 ns 219.05 ns 0.5379 - - 2256 B

@ghost
Copy link

ghost commented Sep 3, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.

@tarekgh tarekgh requested a review from noahfalk September 3, 2020 23:20
@tarekgh
Copy link
Member Author

tarekgh commented Sep 3, 2020

CC @cijothomas @reyang

@ghost
Copy link

ghost commented Sep 3, 2020

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

@macrogreg
Copy link

Thanks for doing this, @tarekgh .
This does not go quite as far as previously suggested in #39690 , but it goes a good step in that direction. Glad to see this improvement!

@cijothomas
Copy link
Contributor

@CodeBlanch FYI.

@tarekgh tarekgh self-assigned this Sep 4, 2020
@CodeBlanch
Copy link
Contributor

Thanks for doing this @tarekgh. Interesting results! Looks like Dictionary<,> uses less memory than ConcurrentDictionary<,>, that seems reasonable. The speed-up, that's surprising. Explicitly taking a lock is faster than the lockless algorithm? 🤯

@tarekgh
Copy link
Member Author

tarekgh commented Sep 4, 2020

that's surprising. Explicitly taking a lock is faster than the lockless algorithm?

Usually the lockless will be faster in the scenarios when more than one thread competing on executing the atomic block. In Activity this is not regular case and the lock will be cheap especially we are using the internal dictionary instance as the lock.

@noahfalk
Copy link
Member

noahfalk commented Sep 5, 2020

Explicitly taking a lock is faster than the lockless algorithm? 🤯

ConcurrentDictionary isn't lockless, the locks are just managed for you : ) https://source.dot.net/#System.Collections.Concurrent/System/Collections/Concurrent/ConcurrentDictionary.cs,916

I'd guess that if many threads reading/writing simultaneously then ConcurrentDictionary's multiple lock strategy is faster, but if the common case is only 1 reader/writer then the multi-lock overhead doesn't payoff. Activity is in the latter category. Fwiw I am a little surprised the BCL team didn't implement it lock-free but there is probably a good reason for their choice I am unaware of.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tarekgh tarekgh merged commit b6e607d into dotnet:master Sep 5, 2020
@tarekgh tarekgh deleted the OptimizeAvtivityWithCustomProperties branch September 5, 2020 00:19
@tarekgh
Copy link
Member Author

tarekgh commented Sep 5, 2020

/backport to release/5.0-rc2

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2020

Started backporting to release/5.0-rc2: https://github.com/dotnet/runtime/actions/runs/240220003

@stephentoub
Copy link
Member

stephentoub commented Sep 5, 2020

ConcurrentDictionary isn't lockless, the locks are just managed for you : )

It uses striped locking for writes, but reads are lock-free. I expect if you were to measure the throughput of this new version just repeatedly reading custom properties (rather than measuring also writing them), you'd find the new version is slower (with less memory).

@noahfalk
Copy link
Member

noahfalk commented Sep 5, 2020

I expect if you were to measure the throughput of this new version just repeatedly reading custom properties

Thanks Stephen! I'd expect the typical usage to be a write:read ratio that is relatively even, 1:1 or 1:2. I would consider cases that read many times to be our non-perf conscious consumers in this particular scenario so optimizing for fewer reads and lower memory usage makes sense.

For my own curiousity, do you know what the tradeoffs were between striped locks and a lock-free write? I searched around in the source a little bit but I couldn't find any documented rationale on the implementation choice.

@stephentoub
Copy link
Member

For my own curiousity, do you know what the tradeoffs were between striped locks and a lock-free write?

How would you implement the lock-free write?

@noahfalk
Copy link
Member

noahfalk commented Sep 5, 2020

How would you implement the lock-free write?

I have never gone into the details but my limited understanding suggested implementations did exist if we wanted them? http://www.cse.chalmers.se/~tsigas/papers/Lock-Free_Dictionary.pdf
Whether the implementations were any good and/or applicable to particular requirements we had I have no idea. I was hoping someone else might have already done the research : )

I do know one of the big challenges with lock-free algorithms is that they often require lock-free memory reclamation. For languages without GCs I could imagine that being a huge pain but for .NET that appeared to be less of an issue.

@tarekgh
Copy link
Member Author

tarekgh commented Sep 5, 2020

https://github.com/VSadov/NonBlocking

@danmoseley
Copy link
Member

That’s interesting I wonder what advantages ConcurrentDictionary has if any over @VSadov one above.

@am11
Copy link
Member

am11 commented Sep 5, 2020

Thanks for sharing that link. Slides from the YouTube video in readme are available here https://web.stanford.edu/class/ee380/Abstracts/070221_LockFreeHash.pdf.

@stephentoub
Copy link
Member

implementations did exist if we wanted them?

If we want to overhaul the entire implementation, sure. I'm not aware of a good way to implement lock-free writes with this implementation.

@VSadov and I discussed his implementation several years back. I have zero attachment to the current implementation; my main concern is ensuring that semantics and correctness are maintained while not regressing perf, especially for reads. If a new implementation can maintain all that and improve throughput and reduce memory allocation/consumption, great. I previously rewrote ConcurrentQueue for .NET Core 2.0 following similar principles.

@noahfalk
Copy link
Member

noahfalk commented Sep 9, 2020

Thanks all! From what I see the only downside yet presented for the non-blocking option is that it would need some work to make it the new default + inherent risk that a new implementation hasn't been validated in a shipping product. To me that sounds quite promising and would just need to be prioritized against other BCL perf improvements we'd spend our time on. The scenario in this issue is probably far too narrow to be the sole justification, but across .NET broadly its not hard too imagine that many scenarios would all benefit a bit from a faster concurrent dictionary. If I spot other instances that would benefit I'll give a heads up.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve perf of Activity Get/SetCustomProperty API for auto-instrumentation (codeless attach) APM scenarios.
9 participants