-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CHIPDeviceControllerFactory shutdown #19022
Fix CHIPDeviceControllerFactory shutdown #19022
Conversation
7c5a9b5
to
8a9c33c
Compare
PR #19022: Size comparison from 98a1c6b to 8a9c33c Increases (5 builds for cyw30739, linux, telink)
Decreases (1 build for cc13x2_26x2)
Full report (41 builds for cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
|
8a9c33c
to
b75b8c6
Compare
b75b8c6
to
623d21c
Compare
PR #19022: Size comparison from 48606c6 to 623d21c Increases (4 builds for cc13x2_26x2, linux)
Decreases (3 builds for cc13x2_26x2, telink)
Full report (41 builds for cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
|
56d3d04
to
dd05fae
Compare
PR #19022: Size comparison from 2ac3ee7 to dd05fae Increases (7 builds for cc13x2_26x2, linux, telink)
Decreases (1 build for cc13x2_26x2)
Full report (34 builds for cc13x2_26x2, cyw30739, k32w, linux, mbed, nrfconnect, p6, telink)
|
What worries me more is that
but that test does nothing particularly complicated: it brings up a factory, brings up a controller, shuts down the controller, shuts down the factory. I would expect that to Retain and Release things correctly... |
Trying to get hold of a Mac so I can track that down. Let me make some broader assertions about this
The reason this patch has been evolved back towards calling Delete from the factory instead of when refcount drops to zero is that I think the refcounting should be removed entirely, and this de-emphasizes it (keeping only the 'magic shutdown' in an attempt to preserve the semantics and fix the most critical issues while we find all the places that need to have an explicit shutdown). |
One last assertion is that I think it is not actually useful for the refcount to be able to extend the lifetime of the system state past the factory itself. Which is why I believe descoping the refcount to only 'magic shutdown' and not lifetime should be OK (the other reason being that we always deleted the state here, anyway). |
I misread the test. It brings up a factory, fails to bring up a controller because it fails some pre-bringup validation, shuts down the factory. So it is in fact hitting the "factory being shut down without ever bringing up a controller" case. |
I agree.
That seems sane. The important part is that we want to shut down the state when the number of live controllers transitions back to 0 after becoming nonzero. Even better would be if we never even created the state's various members until the first controller is brought up, and then shut it down when the last controller shuts down.
I agree. It looks like #19697 got merged in a few hours ago, which was also fixing some of the things around this stuff. In particular, that PR just calls |
That still feels fairly magic to me; I would prefer that we leave deciding when to init the factory and shut it down to a higher layer:
I don't know that the system state and factory need to even be different objects. Changing the API is probably out of scope for this bugfix, but would appreciate your thoughts while I have your attention. For our purposes, direct control over the lifecycle seems best.
That makes sense. A prior version of this PR actually had the shutdown-from-dtor but then I thought that non-idempotent shutdown was breaking the tests (thus was before the mWasShutdown was added to force idempotency). This version had a VerifyOrDie instead of force-shutdown to see if this was happening.
|
dd05fae
to
1fb1536
Compare
PR #19022: Size comparison from 676646d to 1fb1536 Increases (3 builds for linux, telink)
Decreases (2 builds for cc13x2_26x2, esp32)
Full report (30 builds for cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
|
@mspang So I think the intent is that the factory is basically a singleton. It can be inited and shut down, but there's just one around. The factory keeps track of two sets of objects:
The system state is just a convenient way to keep track of these various objects, both for consumption from inside controllers and for cleanup purposes. We could move all the logic of when the Matter stack should be listening for messages out to consumers. Then the factory would basically have the following API:
Item 1 could be combined with item 2 and item 5 with item 6, if we want to insist that consumers should just keep track of the externally-provided objects themselves..... Right now a factory basically has item 1, item 3, item 4, and item 6 as explicit API. Item 2 happens (not very consistently) when item 1 happens or when the controller count increases from 0 after we've hit item 5. Item 5 happens when the controller count goes to 0 or item 6 happens, whichever comes first. I agree this is a pretty annoying API. In particular, the asymmetry between "I inited the factory but have not created the first controller" (stack is running) and "I inited the factory, created a controller, then shut it down" (stack is not running) is really unfortunate. I am very interested, in general, in saner API here... |
1fb1536
to
7553dca
Compare
Definitely. I think we should follow this bug fix up with a more thorough cleanup. The asymmetry is just one of the warts with this, the other is that there is very little ability to actually inject many of these dependencies. |
PR #19022: Size comparison from d1a4bb7 to 7553dca Increases (2 builds for linux, telink)
Full report (30 builds for cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
|
- SessionResumptionStorage is leaked - Ownership is passed using raw pointers. This is bad form, and there are many more instances of it (not fixed here). - DeviceControllerSystemState destructor is called twice (once by Release(), then again by Delete())
7553dca
to
d2662fc
Compare
PR #19022: Size comparison from 5809552 to d2662fc Increases (6 builds for cc13x2_26x2, k32w, linux, nrfconnect)
Decreases (6 builds for cc13x2_26x2, cyw30739, k32w, nrfconnect, telink)
Full report (30 builds for cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
|
Problem
SessionResumptionStorage is leaked
and there are many more instances of it (not fixed here).
DeviceControllerSystemState destructor is called twice
(once by Release(), then again by Delete())
Change overview
Testing
Unit tests & CI