-
-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference counting changes #1951
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1951 +/- ##
===========================================
- Coverage 81.92% 68.20% -13.72%
===========================================
Files 95 93 -2
Lines 24066 20454 -3612
Branches 3206 3047 -159
===========================================
- Hits 19715 13950 -5765
+ Misses 4280 3965 -315
- Partials 71 2539 +2468 ☔ View full report in Codecov by Sentry. |
a48f777
to
1afe044
Compare
e3a3059
to
58c9028
Compare
Operations that might be performed during teardown, such as reaping, waiting, closing, freeing, should only be done if the aio has properly been initialized. This is important for certain simple cases where inline aio objects are used, and initialization of an outer object can fail before the enclosed aio is initialized.
Once a context has started the process of close, further attempts to close it will return NNG_ECLOSED. What was I thinking to ever do anything else?
This uses simple reference counters for now that should be simpler, and hopefully more reliable.
This is a major change, but it should eliminate some of the problems we have seen with use-after-free bugs in shutdown. It should also be faster as we don't need to use locks as much.
This updates the pipe to use contiguous data for the transport data as well as the pipe protocol data. It updates sockfd to use this, and eliminates the need for the sockfd transport to do its own asynchronous reaping, thereby hopefully closing a shutdown race. The other transports will shortly get the same treatment. Also fixed valgrind complaint about uninitialized data in the socket test.
This avoids certain kinds of challenging deadlocks during finalization, but it does require users of the optimized nni_aio_init function to explicitly call nni_aio_stop before doing nni_aio_fini. As a minor benefit, this should reduce the number of mutex entry/exit blocks for very short lived objects (such as rapidly recycling contexts).
If an error occurs, the application gets to know about it. There cannot be external factors that cause us to spin for memory, since this is not accessible via the network.
We should probably come back and make this more explicit with a separate endpoint stop() function, which can be blocking and call nni_aio_stop. For now this gets us over the hump.
The attempt to use nni_task_abort() was completely misguided. In fact this function isn't needed, and is a relic of a design that predates the nni_aio_begin / nni_aio_schedule split. Additionally, nni_aio_abort needed a fix to prevent a hang if it was called between the calls to nni_aio_prep and nni_aio_schedule. (Essentially a canceled operation should fail in scheduling.)
Also, includes a few fixes for the sockfd transport.
Needs to be redone. Later. |
This converts the main part of NNG to use reference counting atomics efficiently, instead of some other hacky approaches using locks. It should be safer, and faster both!