-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIXED] microservice cleanup (flapping MicroServiceStops... tests) #816
Conversation
- Removed the global hash of services, it was the most immediate source of the deadlock. - Added the list of microservices to `natsConnection`. **NOTE** increased memory allocation for all connections, need a better way. - Fixed the flow of replacing an existing endpoint in a service. - Got rid of ep->name since it was synonimous with ep->cfg->Name - Adjusted the tests, some minor fixes.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #816 +/- ##
==========================================
+ Coverage 68.71% 70.50% +1.79%
==========================================
Files 39 47 +8
Lines 15207 15366 +159
Branches 3143 3177 +34
==========================================
+ Hits 10449 10834 +385
+ Misses 1700 1460 -240
- Partials 3058 3072 +14 ☔ View full report in Codecov by Sentry. |
src/micro.c
Outdated
// Wrap the connection callbacks before we subscribe to anything. | ||
MICRO_CALL(err, _wrap_connection_event_callbacks(m)); | ||
MICRO_CALL(err, micro_ErrorFromStatus( | ||
natsOptions_setMicroCallbacks(m->nc->opts, _on_connection_closed, _on_error))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for instance even if the nc
connection is valid at the time we enter micro_AddService
, but happen to be closed during execution of this function, it will leave things in a bad state. If service is part of a connection, I think it should be handled by the connection, even if micro_X
calls are just a proxy to internal natsConn_X
calls. Have you thought that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, in fact I was VERY inclined to make them "1st class citizens" in natsConnection
, but not for this PR... This all really came out from trying to fix a single race condition on that global services table I had before, wrong locking order in 1 place.
OT: In light of the orbit
movement I've been contemplating adding simple "internal use" extensibility to both Connection and Subscription; for data (like a hashmap) and callbacks (say, with a bool lock
parameter). My initial microservices implementation was naive, it is still marked experimental, and I would really love to externalize and re-factor it to orbit before it is final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved all "attach to connection" functionality into the func, and added a natsConn_Lock
around + connection validation. Thanks for pointing it out. After the service is added to connection, it should unwind "normally" if it's terminated.
numEndpoints = m->numEndpoints; | ||
_unlock_service(m); | ||
|
||
if ((refs == 0) && (numEndpoints == 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: would it be symptomatic of a bug if we have refs == 0
and numEndpoints != 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because of the order of execution. As I understand it (and can prove, I think) On(Connection)Closed
may be invoked in the midst of Sub's threads calling the On(Sub)Complete
from their own threads. greyhair ++
Similar for error callbacks, maybe it was just one of the 2, not 100% sure now.
src/micro.c
Outdated
} | ||
natsMutex_Unlock(nc->servicesMu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock was acquired only under if (s == NATS_OK)
after setting the callbacks, but here you unlock unconditionally. You should the if
statement above (where you bump service's ref count) under the if
statement where you acquire the service lock. This unlock would fall under that previous if
statement too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(hit the wrong button before) thanks for catching this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…g MicroServiceStops... tests) (#816)
natsConnection
. NOTE increased memory allocation for all connections, need a better way.