Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Make ConditionalWeakTable reads lock-free #8507

Merged
merged 2 commits into from
Dec 9, 2016

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Dec 7, 2016

The ConditionalWeakTable implementation in corert supports lock-free reads, whereas the current implementation in coreclr locks on every operation. The first commit in this PR ports the corert implementation back to coreclr, to provide lock-free reads. In a simple benchmark that repeatedly and serially reads items from the table, this improves throughput by ~25%. But in a simple benchmark that repeatedly and in parallel reads items from the table, this improves throughput significantly, changing something that was entirely serialized to something that runs in parallel, e.g. on my quad-core, throughput improved by ~5x.

However, the implementation did make one situation significantly worse. Adding lots and lots of objects to the table resulted in a massive slowdown, due to how dependency handles were managed to enable the lock-freedom. For example, in a test that added 5M object pairs to the table, before the change it took ~4s and after the change anywhere from ~15s to ~30s. The second commit addresses this, by changing the scheme slightly with how dependency handles are managed, and specifically avoiding duplicating them. This brings addition speed back to being approximately the same as it was before, while maintaining the lock-freedom and read speed improvements.

I also added more tests in dotnet/corefx#14279, and they pass before and after this change.

(I will wait to merge this until after #8490 goes in and incorporate those changes.)

Fixes #3206
cc: @jkotas, @mjp41, @Maoni0

@stephentoub
Copy link
Member Author

Shoot, I just realized my second commit doesn't correctly handle removes. Will fix.

@stephentoub
Copy link
Member Author

I fixed the issue with removed handles not being freed.

@stephentoub
Copy link
Member Author

I addressed @mjp41's observation about freeing the handles for collected objects.

But I'm not sure how to address @jkotas' concern about resurrection. Is there some special runtime finalization-related support we can use to help in this case? Otherwise, the whole approach from corert seems flawed. In looking over the code, the aim was to make reads not only lock-free but also interlocked-free (at least in managed code), and the latter is the key part. If it's interlocked-free, we can't know while we're resizing or removing that some other thread may have already grabbed a dependent handle but not used it yet, which means we can't free it then, which means we need to rely on finalization, but in the finalizer we may similarly be racing with this object getting resurrected and another thread accessing the dependent handles. If we were to use interlocked operations, we could probably ref count every handle access; it would likely be the same (or maybe even slightly higher) reading cost as with a lock, but without serializing parallel access.

Thoughts?

@mjp41
Copy link
Member

mjp41 commented Dec 8, 2016

Would the following work?

The Containers have a field, _parent, pointing to the parent ConditionalWeakTable.
The Containers have a field, _finalizationCount counting the number of times the finalizer has been called.

At the start of the finalizer add something like

// Need to take the parent lock, so that _parent._container assignment does not race.
lock(_parent._lock) {
    if(_oldKeepAlive == null
         || _finalizationCount == 0 ) {
       //
       // This is the currently active container, and the first time it has been finalised.
       //

       // unlink from _parent to make this object unreachable, and unresurrectable
       _parent._container = null;
       // Resurrect this object.
       GC.ReRegisterForFinalize(this);
        _finalizationCount = 1;
       return;   //A
    }
}

At A, the container has been resurrected, but is unreachable in the Heap, so only current readers can be accessing it. They will be keeping it live in the GC, but once they no longer have it as a root, then it will be fully unreachable, and cannot be reaccessed or resurrected, so the second time the finalization is called, there won't be a race.

@stephentoub
Copy link
Member Author

Thanks, @mjp41. I hadn't considered re-registration, but that makes sense as a way to fight against resurrection. I'm also not too concerned about the costs here, as in my experience the majority of use for a ConditionalWeakTable is in very long-lived scenarios, e.g. a statically rooted CWT.

Question on your example, though... I assume you meant:

(_oldKeepAlive == null && _finalizationCount == 0)

rather than:

(_oldKeepAlive == null || _finalizationCount == 0)

? We shouldn't need to play this trick with old containers (which would have _oldKeepAlive != null), since they're no longer referenced from the heap anyway.

@mjp41
Copy link
Member

mjp41 commented Dec 8, 2016

@stephentoub yes, you are correct. I meant &&.

@mjp41
Copy link
Member

mjp41 commented Dec 8, 2016

I am a little unsure if all your code for _oldKeepAlive is necessary in Resizing/Constructing, as all the objects are live, so do they really need such a hand-over of ownership? Is this it protect against the thread going away mid-execution?

@stephentoub
Copy link
Member Author

I am a little unsure if all your code for _oldKeepAlive is necessary

By "all your code", you mean the two lines that set _oldKeepAlive to a singleton and then set it to null? This was part of my trying to protect against things like thread aborts, but I already removed part of that solution, since thread aborts aren't possible in coreclr/corert, so I guess I can remove those two lines as well. (I also need to fix up Clear given your finalization suggestion.)

@stephentoub
Copy link
Member Author

stephentoub commented Dec 8, 2016

yes, you are correct

Actually, I think we're both wrong, and unfortunately I don't think we can factor in _oldKeepAlive into this logic. Consider this usage:

  • CWT gets dropped and then resurrected
  • Just before the active container from that CWT is finalized, two threads start to access it, one adding and one reading.
  • The one reading grabs a dependency handle but doesn't dereference it yet.
  • The one adding finds that it needs to grow because the current container is full, so it takes the lock, creates a new container, and sets this container's _oldKeepAlive to non-null.
  • Then this container's finalizer continues running, takes the lock, and sees _oldKeepAlive as non-null, at which point if we're factoring in _oldKeepAlive and skipping this logic in that case, we'll proceed to free the dependency handles, even though a reader has access to one of them.

That unfortunately means the logic would need to be more like:

...
if (!_finalized)
{
    finalized = true;
    lock (_parent._lock)
    {
        if (_parent._container == this)
            _parent._container = null;
    }
    GC.ReRegisterForFinalize(this);
    return;
}
...

and every container will end up surviving multiple rounds of GC. Maybe given typical usage that's not bad?

@stephentoub
Copy link
Member Author

I've updated with this change, but I'm going to also try out the reference counting approach and see how it fairs in comparison.

@stephentoub
Copy link
Member Author

but I'm going to also try out the reference counting approach and see how it fairs in comparison

... the ref counting approach I tried is much worse... not quite as bad as a lock on every operation, but pretty close.

@jkotas
Copy link
Member

jkotas commented Dec 8, 2016

every container will end up surviving multiple rounds of GC

It is pretty standard pattern in this kind of lock-free schemes around unmanaged resources. Java has PhantomReference primitive for this exact purpose that allows the runtime to make it a bit more efficient in theory. We are simulating it with what's available. We have other places with similar patterns that require two GC ticks for cleanup, e.g. here:

@stephentoub
Copy link
Member Author

Thanks, @jkotas. In that case, I don't have further changes planned; please let me know if you have any other feedback, are ok with the port + fixes, etc.

@jkotas
Copy link
Member

jkotas commented Dec 9, 2016

LGTM

The CoreRT ConditionalWeakTable was modified to support lock-free reads.  This ports the implementation back to coreclr.
The CoreRT implementation of ConditionalWeakTable that was ported back to CoreCLR uses a special scheme to make reads lock-free.  When the container needs to grow, it allocates new arrays and duplicates all of the dependency handles from the previous array, rather than just copying them.  This avoids issues stemming from a thread getting a dependency handle in an operation on the container, then having that handle destroyed, and then trying to use it; the handle won't be destroyed as long as the container is referenced.

However, this also leads to a significant cost in a certain situation.  Every time the container grows, it allocates another N dependency handles where N is the current size of the container.  So, for example, with an initial size of 8, if 64 objects are added to the container, it'll allocate 8 dependency handles, then another 16, then another 32, and then another 64, resulting in significantly more handles than in the old implementation, which would only allocate 64 handles total.

This commit fixes that by changing the scheme slightly.  A container still frees its handles in its finalizer.  However, rather than duplicating all handles, that responsibility for freeing is transferred from one container to the next.  Then to avoid issues where, for example, the second container is released while the first is still in use, a reference is maintained from the first to the second, so that the second can't be finalized while the first is still in use.

The commit also fixes a race condition with resurrection and finalization, whereby dependency handles could be used while or after they're being freed by the finalizer. It's addressed by only freeing handles in a second finalization after clearing out state in the first finalization to guarantee no possible usage during the second.
@mjp41
Copy link
Member

mjp41 commented Dec 9, 2016

@stephentoub I agree we were both wrong :-)

LGTM

@stephentoub
Copy link
Member Author

Thanks, @mjp41 and @jkotas.

@safern, will your ConditionalWeakTable change go in soon? If so, I'll wait for that and incorporate your changes into this. If not, I could go ahead with this and then you could merge.

@safern
Copy link
Member

safern commented Dec 9, 2016

@stephentoub I expect to have them approved by today, but you can go ahead and merge and then I'll merge my changes with yours :)

@stephentoub
Copy link
Member Author

Ok. Thanks, @safern.

@stephentoub stephentoub merged commit 7adcc81 into dotnet:master Dec 9, 2016
@stephentoub
Copy link
Member Author

@jkotas, I assume the fixes from here should be ported back to corert?

@jkotas
Copy link
Member

jkotas commented Dec 9, 2016

Yes, it would be nice to port them.

@jkotas jkotas mentioned this pull request Dec 29, 2016
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Make ConditionalWeakTable reads lock-free

Commit migrated from dotnet/coreclr@7adcc81
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants