Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference counting changes #1951

Closed
wants to merge 23 commits into from
Closed

Reference counting changes #1951

wants to merge 23 commits into from

Conversation

gdamore
Copy link
Contributor

@gdamore gdamore commented Nov 29, 2024

This converts the main part of NNG to use reference counting atomics efficiently, instead of some other hacky approaches using locks. It should be safer, and faster both!

Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 83.24873% with 66 lines in your changes missing coverage. Please review.

Project coverage is 68.20%. Comparing base (9c0b9d6) to head (e5f6632).
Report is 119 commits behind head on main.

Files with missing lines Patch % Lines
src/core/pipe.c 74.24% 13 Missing and 4 partials ⚠️
src/nng.c 65.62% 6 Missing and 5 partials ⚠️
src/core/aio.c 68.75% 5 Missing and 5 partials ⚠️
src/sp/transport/socket/sockfd.c 71.87% 2 Missing and 7 partials ⚠️
src/core/socket.c 90.27% 2 Missing and 5 partials ⚠️
src/core/dialer.c 89.65% 0 Missing and 3 partials ⚠️
src/supplemental/websocket/websocket.c 93.18% 1 Missing and 2 partials ⚠️
src/core/listener.c 92.85% 0 Missing and 2 partials ⚠️
src/platform/posix/posix_tcpdial.c 75.00% 0 Missing and 1 partial ⚠️
src/sp/protocol/pubsub0/sub.c 0.00% 0 Missing and 1 partial ⚠️
... and 2 more

❗ There is a different number of reports uploaded between BASE (9c0b9d6) and HEAD (e5f6632). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (9c0b9d6) HEAD (e5f6632)
4 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1951       +/-   ##
===========================================
- Coverage   81.92%   68.20%   -13.72%     
===========================================
  Files          95       93        -2     
  Lines       24066    20454     -3612     
  Branches     3206     3047      -159     
===========================================
- Hits        19715    13950     -5765     
+ Misses       4280     3965      -315     
- Partials       71     2539     +2468     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gdamore gdamore force-pushed the refcount branch 10 times, most recently from a48f777 to 1afe044 Compare December 7, 2024 16:15
@gdamore gdamore force-pushed the refcount branch 2 times, most recently from e3a3059 to 58c9028 Compare December 7, 2024 21:38
Operations that might be performed during teardown, such as reaping,
waiting, closing, freeing, should only be done if the aio has properly
been initialized.  This is important for certain simple cases where
inline aio objects are used, and initialization of an outer object can
fail before the enclosed aio is initialized.
Once a context has started the process of close, further attempts
to close it will return NNG_ECLOSED.  What was I thinking to ever
do anything else?
This uses simple reference counters for now that should be simpler,
and hopefully more reliable.
This is a major change, but it should eliminate some of the problems
we have seen with use-after-free bugs in shutdown.  It should also
be faster as we don't need to use locks as much.
This updates the pipe to use contiguous data for the transport data
as well as the pipe protocol data.  It updates sockfd to use this, and
eliminates the need for the sockfd transport to do its own asynchronous
reaping, thereby hopefully closing a shutdown race.

The other transports will shortly get the same treatment.

Also fixed valgrind complaint about uninitialized data in the socket test.
This avoids certain kinds of challenging deadlocks during finalization,
but it does require users of the optimized nni_aio_init function to
explicitly call nni_aio_stop before doing nni_aio_fini.

As a minor benefit, this should reduce the number of mutex entry/exit
blocks for very short lived objects (such as rapidly recycling contexts).
If an error occurs, the application gets to know about it.  There
cannot be external factors that cause us to spin for memory, since
this is not accessible via the network.
We should probably come back and make this more explicit with a separate
endpoint stop() function, which can be blocking and call nni_aio_stop.
For now this gets us over the hump.
The attempt to use nni_task_abort() was completely misguided.
In fact this function isn't needed, and is a relic of a design that
predates the nni_aio_begin / nni_aio_schedule split.

Additionally, nni_aio_abort needed a fix to prevent a hang if
it was called between the calls to nni_aio_prep and nni_aio_schedule.
(Essentially a canceled operation should fail in scheduling.)
Also, includes a few fixes for the sockfd transport.
@gdamore
Copy link
Contributor Author

gdamore commented Dec 27, 2024

Needs to be redone. Later.

@gdamore gdamore closed this Dec 27, 2024
@gdamore gdamore deleted the refcount branch December 27, 2024 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant