Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data model objects can be accessed before the data model server is initialized #15295

Closed
kkasperczyk-no opened this issue Feb 17, 2022 · 5 comments · Fixed by #15909
Closed
Labels
bug Something isn't working V1.0

Comments

@kkasperczyk-no
Copy link
Contributor

kkasperczyk-no commented Feb 17, 2022

Problem

On nrfconnect I did observation that StartUp event is generated successfully after the boot if the device is not commissioned and the generation fails if the device was previously commissioned. The root cause of that is the event generation is scheduled by the InitChipStack and it uses data model objects even before data model server was initialized. It looks like some time race conditions may occur and sometimes data model will manage to be initialized on time and sometimes not.

Generation schedule: https://github.com/project-chip/connectedhomeip/blob/master/src/include/platform/internal/GenericPlatformManagerImpl.cpp#L126
Delegates set by data model: https://github.com/project-chip/connectedhomeip/blob/master/src/include/platform/internal/GenericPlatformManagerImpl.cpp#L297
Call of InitChipStack: https://github.com/project-chip/connectedhomeip/blob/master/examples/lock-app/nrfconnect/main/main.cpp#L46
Call of data model server init: https://github.com/project-chip/connectedhomeip/blob/master/examples/lock-app/nrfconnect/main/AppTask.cpp#L102

Problem was exposed by the cluster events behavior, but I believe it should not happen that CHIP events loop is started and processed events may try accessing data model objects, before data model server was initialized. Many platforms implement it in the similar way, so probably it affects not only nrfconnect.

Proposed Solution

@kkasperczyk-no kkasperczyk-no added bug Something isn't working V1.0 labels Feb 17, 2022
@bzbarsky-apple
Copy link
Contributor

It looks like some time race conditions may occur

Well, there is a data race here. The init code linked to seems to be doing the following:

  1. Call InitChipStack from main, which schedules HandleDeviceRebooted.
  2. Call StartEventLoopTask from main, which starts a new MAtter thread and starts processing scheduled things on that thread.
  3. Call StartApp from main, which calls AppTask::Init and eventually reaches the Server::Init call.

Now Server::Init is going to be seting up data model stuff on the main thread while the Matter event loop is running on the Matter thread and they are both going to be touching the same data structures at the same time.

I added asserts to catch this race in #15152 but could not figure out how to do the relevant "lock is not held" checks on Zephyr....

The second bullet of the proposed solution sounds good, in terms of ensuring that we're done touching Matter bits before the separate Matter thread starts running....

@bzbarsky-apple
Copy link
Contributor

But also, the various example apps that are doing Server::Init off the Matter event loop are presumably failing to correctly handle the StartUp event, in that it consistently does not work there...

@bzbarsky-apple
Copy link
Contributor

Oh, and CI is not catching this stuff because the "linux" impl we use in CI for Linux and Darwin does not use StartEventLoopTask at all: it uses the blocking RunEventLoop, so perforce does all its setup before running the event loop...

So I guess we need to enforce at minimum that Server::Init happens before the event loop starts running...

@kkasperczyk-no
Copy link
Contributor Author

kkasperczyk-no commented Feb 17, 2022

So I guess we need to enforce at minimum that Server::Init happens before the event loop starts running...

Yes, that seems to solve my issue locally. I just moved event loop start after all application initialization is over.

@bzbarsky-apple
Copy link
Contributor

That said, its annoying that the way this event is generated forces all the init work to happen on the main thread, not the Matter thread....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working V1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants