-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Not all nodes registered to the same pattern get new updates from exchange #3769
Comments
@hanicornelia - Would it be possible to capture the agent logs on one of the agents that does not get upgraded? Capture the /var/log/syslog and gzip it up and post it here if you can. |
Closing #3703 which sounds like same issue. |
These are all the logs/files for your reference. An additional information is that we discovered that this bug happened to nodes that are unable to communicate for more than 2 hours to the exchange, but did not occur if its less than 1 hour. Also these are the updated steps to reproduce: Exchange hub:
Next, on the node side:
With that, below are the logs for your reference, notable timestamp are: |
…attern get new updates from exchange
…attern get new updates from exchange Signed-off-by: Max McAdam <[email protected]>
Issue #3769 - Bug: Not all nodes registered to the same pattern get n…
I built the master branch (commit #3807) and did the bug reproduction steps. However, bug was not solved with this pull request. Attached is the agent logs. Similar to before, we can see the node knows there are new version of the service, but somehow it didnt form an agreement with it, so the service is not updated after the heartbeat is restored. |
…attern get new updates from exchange Signed-off-by: Max McAdam <[email protected]>
Describe the bug.
We have multiple nodes registered to this pattern
IOT-home-full
. We currently have version4.19.154
for servicekitchen-controller
in patternIOT-home-full
, and we published a new version4.19.158
for this service, and update the pattern to use this new version. We expected for all nodes that are registered to this pattern will receive the new version, however, only some nodes received proposal message and agree with the new version, while the other nodes do not receive anything.lastUpdated
is the same as the time we publish the pattern. (11 May, 4.43 UTC)We used this command
hzn exchange pattern ls <pattern>
to checkFor example, this is the eventlog for the good node
IOT-NODE-1A
that is successfully receiving the proposal, where we can see from the timestamp, the proposal arrives around 1 minute after we publish the pattern (11 May, 12.46 CST)But in the node that is not receiving the proposal
IOT-NODE-B1
, the eventlog do not show anything on 11 May. The last log is saying the node heartbeat is restored (10 May, 11.55 CST), so it should be able to communicate with the exchange.First in the agent container of the node
IOT-NODE-B1
, the agent actually send http request to the exchange, and receives a response saying that there are no new changes from the hub. (11 May, 9.41 UTC)Then in the exchange hub, we check this node
IOT-NODE-B1
and see the last heartbeat timestamp is also recent (11 May, 9.45 UTC)We also checked from inside exchange-api container, and see the http request is created for the node
IOT-NODE-B1
(11 May, 7.23 UTC)So we are confident that the node is able to communicate with the exchange, though we are not sure why the node did not receive any new proposals from the exchange.
What we do to fix now is to unregister the node, and register it again. Now it able to get all the new updates, but we do not want to do this for all the nodes we have everytime we have a new updates.
Describe the steps to reproduce the behavior.
The nodes are online most of the time but they do lose connectivity to the exchange for 1 hour everyday.
Our steps to reproduce are:
Expected behavior.
All nodes registered to the same pattern should be receiving the new updates after internet connection is restored.
Screenshots.
No response
Operating Environment
Node details:
Exchange details:
Additional Information
No response
The text was updated successfully, but these errors were encountered: