Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow create new timer from none CHIP main thread #6827

Merged
merged 1 commit into from
May 14, 2021
Merged

Allow create new timer from none CHIP main thread #6827

merged 1 commit into from
May 14, 2021

Conversation

yufengwangca
Copy link
Contributor

@yufengwangca yufengwangca commented May 14, 2021

Problem

I detected random assert failure when I run cirque sanity test manually. We recently add the following check within NewTimer

Error Layer::NewTimer(Timer *& aTimerPtr)
{
assertChipStackLockedByCurrentThread();
.....
}

SysProcess take ChipStackLock for each asynchronous message processing. And the only place to call NewTimer is from mdns client thread.

In case the CHIP main loop started before mdns client thread, assertChipStackLockedByCurrentThread() will fail.

bool GenericPlatformManagerImpl_POSIX::_IsChipStackLockedByCurrentThread() const
{
return !mMainLoopStarted || (mChipStackIsLocked && (pthread_equal(pthread_self(), mChipStackLockOwnerThread)));
}

Summary of Changes

Allow create new timer from none CHIP main thread

@woody-apple
Copy link
Contributor

@bzbarsky-apple @vivien-apple ?

Copy link
Contributor

@andy31415 andy31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently for log-only.

Are you saying that NewTimer is indeed thread safe?

Otherwise we should fix that mdns has to aquire the lock and not that we remove the assertion.

@andy31415
Copy link
Contributor

I did add a 'assertLocked' in many places and it may be overly aggressive, so if this is thread safe we can comment it and then the change is ok. I would like to make sure and not have 'remove assert because assert shows up'.

The reason I originally added the assert was because the method:

  • accesses state() which is not locked and is global per layer
  • accesses a static pool and I am unsure if our object pools are indeed thread safe.

Please provide an analysis on why this is indeed thread safe in a comment if removing the assert.

@yufengwangca
Copy link
Contributor Author

I did add a 'assertLocked' in many places and it may be overly aggressive, so if this is thread safe we can comment it and then the change is ok. I would like to make sure and not have 'remove assert because assert shows up'.

The reason I originally added the assert was because the method:

  • accesses state() which is not locked and is global per layer
  • accesses a static pool and I am unsure if our object pools are indeed thread safe.

Please provide an analysis on why this is indeed thread safe in a comment if removing the assert.

My initial reaction is to use lock within client thread to solve this issue, but I found assertChipStackLockedByCurrentThread is only checking against big ChipStackLock and only when CHIP main loop has started.

The big ChipStackLock is originally introduced to protect event processing within DeviceLayer, I think we need to think about it more if we use it to protect lib functions within SystemLayer. But this assert issue can not be solved if we use System::Mutext from system layer itself to protect Error Layer::NewTimer(Timer *& aTimerPtr).

BTW, why we not assert check other functions such as:
Error Layer::StartTimer(uint32_t aMilliseconds, TimerCompleteFunct aComplete, void * aAppState)
void Layer::CancelTimer(Layer::TimerCompleteFunct aOnComplete, void * aAppState)

@andy31415
Copy link
Contributor

I see __sync_bool_compare_and_swap in the Object::TryCreate, so I believe pool access is actually thread safe and only the current state is not technically ... but then do I really care. Please add a comment that this is assumed thread safe and then we can submit the change.

I believe other places I used similar reasoning (saw static pool access so added the assert) so more cleanup may also be needed.

@andy31415
Copy link
Contributor

For other assertions missing: this may also be an issue - I added some for testing but not all. Figuring out where to assert and where not to is likely a bigger job to iterate upon. +1 that it needs to be done.

@yufengwangca
Copy link
Contributor Author

I see __sync_bool_compare_and_swap in the Object::TryCreate, so I believe pool access is actually thread safe and only the current state is not technically ... but then do I really care. Please add a comment that this is assumed thread safe and then we can submit the change.

I believe other places I used similar reasoning (saw static pool access so added the assert) so more cleanup may also be needed.

It is not 100% thread safe because of the state check in the wrapper part, I was curious what is the side effect if we could get a valid timer from the pool even SystemLayer's state has been change to other state by another thread.

if (this->State() != kLayerState_Initialized)
    return CHIP_SYSTEM_ERROR_UNEXPECTED_STATE;

lTimer    = Timer::sPool.TryCreate(*this);

@woody-apple woody-apple merged commit f4b00be into project-chip:master May 14, 2021
@yufengwangca yufengwangca deleted the pr/lock/timer branch May 18, 2021 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants