-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Make ConditionalWeakTable reads lock-free #8507
Conversation
src/mscorlib/src/System/Runtime/CompilerServices/ConditionalWeakTable.cs
Show resolved
Hide resolved
Shoot, I just realized my second commit doesn't correctly handle removes. Will fix. |
54cf0c6
to
af98557
Compare
I fixed the issue with removed handles not being freed. |
src/mscorlib/src/System/Runtime/CompilerServices/ConditionalWeakTable.cs
Show resolved
Hide resolved
src/mscorlib/src/System/Runtime/CompilerServices/ConditionalWeakTable.cs
Show resolved
Hide resolved
I addressed @mjp41's observation about freeing the handles for collected objects. But I'm not sure how to address @jkotas' concern about resurrection. Is there some special runtime finalization-related support we can use to help in this case? Otherwise, the whole approach from corert seems flawed. In looking over the code, the aim was to make reads not only lock-free but also interlocked-free (at least in managed code), and the latter is the key part. If it's interlocked-free, we can't know while we're resizing or removing that some other thread may have already grabbed a dependent handle but not used it yet, which means we can't free it then, which means we need to rely on finalization, but in the finalizer we may similarly be racing with this object getting resurrected and another thread accessing the dependent handles. If we were to use interlocked operations, we could probably ref count every handle access; it would likely be the same (or maybe even slightly higher) reading cost as with a lock, but without serializing parallel access. Thoughts? |
Would the following work? The Containers have a field, At the start of the finalizer add something like // Need to take the parent lock, so that _parent._container assignment does not race.
lock(_parent._lock) {
if(_oldKeepAlive == null
|| _finalizationCount == 0 ) {
//
// This is the currently active container, and the first time it has been finalised.
//
// unlink from _parent to make this object unreachable, and unresurrectable
_parent._container = null;
// Resurrect this object.
GC.ReRegisterForFinalize(this);
_finalizationCount = 1;
return; //A
}
} At A, the container has been resurrected, but is unreachable in the Heap, so only current readers can be accessing it. They will be keeping it live in the GC, but once they no longer have it as a root, then it will be fully unreachable, and cannot be reaccessed or resurrected, so the second time the finalization is called, there won't be a race. |
Thanks, @mjp41. I hadn't considered re-registration, but that makes sense as a way to fight against resurrection. I'm also not too concerned about the costs here, as in my experience the majority of use for a ConditionalWeakTable is in very long-lived scenarios, e.g. a statically rooted CWT. Question on your example, though... I assume you meant:
rather than:
? We shouldn't need to play this trick with old containers (which would have _oldKeepAlive != null), since they're no longer referenced from the heap anyway. |
@stephentoub yes, you are correct. I meant &&. |
I am a little unsure if all your code for _oldKeepAlive is necessary in Resizing/Constructing, as all the objects are live, so do they really need such a hand-over of ownership? Is this it protect against the thread going away mid-execution? |
By "all your code", you mean the two lines that set _oldKeepAlive to a singleton and then set it to null? This was part of my trying to protect against things like thread aborts, but I already removed part of that solution, since thread aborts aren't possible in coreclr/corert, so I guess I can remove those two lines as well. (I also need to fix up Clear given your finalization suggestion.) |
Actually, I think we're both wrong, and unfortunately I don't think we can factor in _oldKeepAlive into this logic. Consider this usage:
That unfortunately means the logic would need to be more like: ...
if (!_finalized)
{
finalized = true;
lock (_parent._lock)
{
if (_parent._container == this)
_parent._container = null;
}
GC.ReRegisterForFinalize(this);
return;
}
... and every container will end up surviving multiple rounds of GC. Maybe given typical usage that's not bad? |
81519d7
to
3b8a1dd
Compare
I've updated with this change, but I'm going to also try out the reference counting approach and see how it fairs in comparison. |
... the ref counting approach I tried is much worse... not quite as bad as a lock on every operation, but pretty close. |
It is pretty standard pattern in this kind of lock-free schemes around unmanaged resources. Java has PhantomReference primitive for this exact purpose that allows the runtime to make it a bit more efficient in theory. We are simulating it with what's available. We have other places with similar patterns that require two GC ticks for cleanup, e.g. here:
|
Thanks, @jkotas. In that case, I don't have further changes planned; please let me know if you have any other feedback, are ok with the port + fixes, etc. |
src/mscorlib/src/System/Runtime/CompilerServices/ConditionalWeakTable.cs
Show resolved
Hide resolved
src/mscorlib/src/System/Runtime/CompilerServices/ConditionalWeakTable.cs
Show resolved
Hide resolved
LGTM |
The CoreRT ConditionalWeakTable was modified to support lock-free reads. This ports the implementation back to coreclr.
3b8a1dd
to
1689861
Compare
The CoreRT implementation of ConditionalWeakTable that was ported back to CoreCLR uses a special scheme to make reads lock-free. When the container needs to grow, it allocates new arrays and duplicates all of the dependency handles from the previous array, rather than just copying them. This avoids issues stemming from a thread getting a dependency handle in an operation on the container, then having that handle destroyed, and then trying to use it; the handle won't be destroyed as long as the container is referenced. However, this also leads to a significant cost in a certain situation. Every time the container grows, it allocates another N dependency handles where N is the current size of the container. So, for example, with an initial size of 8, if 64 objects are added to the container, it'll allocate 8 dependency handles, then another 16, then another 32, and then another 64, resulting in significantly more handles than in the old implementation, which would only allocate 64 handles total. This commit fixes that by changing the scheme slightly. A container still frees its handles in its finalizer. However, rather than duplicating all handles, that responsibility for freeing is transferred from one container to the next. Then to avoid issues where, for example, the second container is released while the first is still in use, a reference is maintained from the first to the second, so that the second can't be finalized while the first is still in use. The commit also fixes a race condition with resurrection and finalization, whereby dependency handles could be used while or after they're being freed by the finalizer. It's addressed by only freeing handles in a second finalization after clearing out state in the first finalization to guarantee no possible usage during the second.
1689861
to
03525a4
Compare
@stephentoub I agree we were both wrong :-) LGTM |
@stephentoub I expect to have them approved by today, but you can go ahead and merge and then I'll merge my changes with yours :) |
Ok. Thanks, @safern. |
@jkotas, I assume the fixes from here should be ported back to corert? |
Yes, it would be nice to port them. |
src/mscorlib/src/System/Runtime/CompilerServices/ConditionalWeakTable.cs
Show resolved
Hide resolved
Make ConditionalWeakTable reads lock-free Commit migrated from dotnet/coreclr@7adcc81
The ConditionalWeakTable implementation in corert supports lock-free reads, whereas the current implementation in coreclr locks on every operation. The first commit in this PR ports the corert implementation back to coreclr, to provide lock-free reads. In a simple benchmark that repeatedly and serially reads items from the table, this improves throughput by ~25%. But in a simple benchmark that repeatedly and in parallel reads items from the table, this improves throughput significantly, changing something that was entirely serialized to something that runs in parallel, e.g. on my quad-core, throughput improved by ~5x.
However, the implementation did make one situation significantly worse. Adding lots and lots of objects to the table resulted in a massive slowdown, due to how dependency handles were managed to enable the lock-freedom. For example, in a test that added 5M object pairs to the table, before the change it took ~4s and after the change anywhere from ~15s to ~30s. The second commit addresses this, by changing the scheme slightly with how dependency handles are managed, and specifically avoiding duplicating them. This brings addition speed back to being approximately the same as it was before, while maintaining the lock-freedom and read speed improvements.
I also added more tests in dotnet/corefx#14279, and they pass before and after this change.
(I will wait to merge this until after #8490 goes in and incorporate those changes.)
Fixes #3206
cc: @jkotas, @mjp41, @Maoni0