-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the allocation and speed of ActivitySet/GetCustomProperty #41840
Optimize the allocation and speed of ActivitySet/GetCustomProperty #41840
Conversation
@CodeBlanch FYI. |
Thanks for doing this @tarekgh. Interesting results! Looks like Dictionary<,> uses less memory than ConcurrentDictionary<,>, that seems reasonable. The speed-up, that's surprising. Explicitly taking a lock is faster than the lockless algorithm? 🤯 |
Usually the lockless will be faster in the scenarios when more than one thread competing on executing the atomic block. In Activity this is not regular case and the lock will be cheap especially we are using the internal dictionary instance as the lock. |
ConcurrentDictionary isn't lockless, the locks are just managed for you : ) https://source.dot.net/#System.Collections.Concurrent/System/Collections/Concurrent/ConcurrentDictionary.cs,916 I'd guess that if many threads reading/writing simultaneously then ConcurrentDictionary's multiple lock strategy is faster, but if the common case is only 1 reader/writer then the multi-lock overhead doesn't payoff. Activity is in the latter category. Fwiw I am a little surprised the BCL team didn't implement it lock-free but there is probably a good reason for their choice I am unaware of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/backport to release/5.0-rc2 |
Started backporting to release/5.0-rc2: https://github.com/dotnet/runtime/actions/runs/240220003 |
It uses striped locking for writes, but reads are lock-free. I expect if you were to measure the throughput of this new version just repeatedly reading custom properties (rather than measuring also writing them), you'd find the new version is slower (with less memory). |
Thanks Stephen! I'd expect the typical usage to be a write:read ratio that is relatively even, 1:1 or 1:2. I would consider cases that read many times to be our non-perf conscious consumers in this particular scenario so optimizing for fewer reads and lower memory usage makes sense. For my own curiousity, do you know what the tradeoffs were between striped locks and a lock-free write? I searched around in the source a little bit but I couldn't find any documented rationale on the implementation choice. |
How would you implement the lock-free write? |
I have never gone into the details but my limited understanding suggested implementations did exist if we wanted them? http://www.cse.chalmers.se/~tsigas/papers/Lock-Free_Dictionary.pdf I do know one of the big challenges with lock-free algorithms is that they often require lock-free memory reclamation. For languages without GCs I could imagine that being a huge pain but for .NET that appeared to be less of an issue. |
That’s interesting I wonder what advantages ConcurrentDictionary has if any over @VSadov one above. |
Thanks for sharing that link. Slides from the YouTube video in readme are available here https://web.stanford.edu/class/ee380/Abstracts/070221_LockFreeHash.pdf. |
If we want to overhaul the entire implementation, sure. I'm not aware of a good way to implement lock-free writes with this implementation. @VSadov and I discussed his implementation several years back. I have zero attachment to the current implementation; my main concern is ensuring that semantics and correctness are maintained while not regressing perf, especially for reads. If a new implementation can maintain all that and improve throughput and reduce memory allocation/consumption, great. I previously rewrote ConcurrentQueue for .NET Core 2.0 following similar principles. |
Thanks all! From what I see the only downside yet presented for the non-blocking option is that it would need some work to make it the new default + inherent risk that a new implementation hasn't been validated in a shipping product. To me that sounds quite promising and would just need to be prioritized against other BCL perf improvements we'd spend our time on. The scenario in this issue is probably far too narrow to be the sole justification, but across .NET broadly its not hard too imagine that many scenarios would all benefit a bit from a faster concurrent dictionary. If I spot other instances that would benefit I'll give a heads up. |
Fixes #39591
It has been reported the allocation when using
Activity.SetCustomProperty
is too much. This change enhance the allocation and also show enhancement in the speed perf too.Before Change
After Change