Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cirque - update int/pointer casting and reset subscription counts in tests. #10771

Merged
merged 2 commits into from
Oct 21, 2021

Conversation

cecille
Copy link
Contributor

@cecille cecille commented Oct 21, 2021

Problem

Two commits

  1. GetConnectedDeviceSync - Returned device pointer is actually an int type, which meant that the function never actually waited for a callback and was racing on the device resolve when it attempted to send commands.
  2. Subscription test - The subscription itself causes a callback, which throws off the count between the number of commands and the check. The result is that the check ends, but the change thread continues on and causes weird races with the remaining tests.

Change overview

  1. Fixes GetConnectedDeviceSync to check against the pointer type
  2. Resets count on subscription test and joins the thread before exiting.

Testing

    1. Saw GetConnectedDeviceSync wait for a device callback when running the MobileDeviceTest
    1. No longer see the subscription send logs interspersed with the remaining test commands.

The returned device pointer is actually an int type, which meant
that the function never actually waited for a callback and was
racing on the device resolve when it attempted to send commands.

Test: Saw regular cirque failures after device resolve. After
      this change, saw a resolve wait for the callback as expected.
The subscription itself causes a callback, which throws off the
count between the number of commands and the check. The result is
that the check ends, but the change thread continues on and causes
weird races with the remaining tests.

Resets the count right before the check and adds a thread join.
@andy31415 andy31415 changed the title Fix cirque Fix cirque - update int/pointer casting and reset subscription counts in tests. Oct 21, 2021
@andy31415
Copy link
Contributor

Fast track rationale: this changes test code, changes are small and explained/understood.

@andy31415 andy31415 merged commit b5ad57f into project-chip:master Oct 21, 2021
@cecille cecille deleted the fix_cirque branch October 21, 2021 15:19
Comment on lines 351 to 352
while handler.subscriptionReceived < 5:
# We should observe 10 attribute changes
Copy link
Contributor

@tcarmelveilleux tcarmelveilleux Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems to mismatch intent. Is there a bug here?

@yunhanw-google

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just a bug.

changeThread.start()
with handler.cv:
while handler.subscriptionReceived < 5:
# We should observe 10 attribute changes
handler.cv.wait()
changeThread.join()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on the join reason? Could this wait forever on failed test? Should we have a timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That thread won't fail to join unless the send message call hangs, in which case we're already sunk in this test. The real challenge here is the handler.subscriptionReceived never hitting 5, which will happen if the two threads get out of sync with how many subscriptions they're looking for. But we already have that condition in here, and there's a timer on the tests anyway. I added this because before the thread was continuing along and sending message while other parts of the test were going and it was causing some interesting behaviour. That's rather a different problem (ie, we should add tests for what happens when you fire just a whole lot of requests to the node), but that's orthogonal to this issue.

cecille added a commit to cecille/connectedhomeip that referenced this pull request Nov 1, 2021
There was a request to add a timeout to the join in project-chip#10771.
The thread itself is unlikely to hang - more likely situation here
is the subscription failing to cause updates, causing the main
thread to hang forever on the condition variable. Added time outs
to both so we can have a bit more information about what actually
happened when the test completes.
cecille added a commit to cecille/connectedhomeip that referenced this pull request Nov 2, 2021
There was a request to add a timeout to the join in project-chip#10771.
The thread itself is unlikely to hang - more likely situation here
is the subscription failing to cause updates, causing the main
thread to hang forever on the condition variable. Added time outs
to both so we can have a bit more information about what actually
happened when the test completes.
andy31415 pushed a commit that referenced this pull request Nov 3, 2021
…1297)

* Address request from #10771

There was a request to add a timeout to the join in #10771.
The thread itself is unlikely to hang - more likely situation here
is the subscription failing to cause updates, causing the main
thread to hang forever on the condition variable. Added time outs
to both so we can have a bit more information about what actually
happened when the test completes.

* Restyled by autopep8

Co-authored-by: Restyled.io <[email protected]>
PSONALl pushed a commit to PSONALl/connectedhomeip that referenced this pull request Dec 3, 2021
… in test (project-chip#11297)

* Address request from project-chip#10771

There was a request to add a timeout to the join in project-chip#10771.
The thread itself is unlikely to hang - more likely situation here
is the subscription failing to cause updates, causing the main
thread to hang forever on the condition variable. Added time outs
to both so we can have a bit more information about what actually
happened when the test completes.

* Restyled by autopep8

Co-authored-by: Restyled.io <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants