-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(nucleus): shadow manager stops syncing on upgrade from 2.9.6 to 2.11.1 #1510
Comments
this is similar to the intermittent issue I saw on greengrass losing internet connectivity: |
with debug enabled in greengrass: 2023-08-03T09:09:27.761Z [DEBUG] (pool-2-thread-47) com.aws.greengrass.mqttclient.MqttClient: Getting new MQTT connection. {clientId=home-XXXXXXXXXXXX#2} |
its like shadowmanager is using connection #2 and something pulls that out from under it, but shadowmanager never bothers to reconnect? |
Hello, The logs you provided show logs from Nucleus version 2.11, which means that Greengrass has restarted at least once (upgrading from 2.9.6 to 2.11.1 requires a Nucleus restart). The logs do not show enough information to understand what the issue is. Please provide full logs, debug level would be great as well, showing the full time period from deployment start to shadow stopping, and then some time afterward to show that shadow does not start up again. The first log you showed does say "BaseSyncStrategy: sync. Finished processing sync requests.". Are you making further changes to the shadow either locally or in the cloud? How do you know that the shadow is not syncing? |
ok have attached the log from before. the deploy of 2.11 starts at 09:04:12 and finishes at 09:09. 1510.csv.gz as you should see, the logs show we're updating regularly "2023-08-03T08:55:00.155Z [DEBUG] (pool-2-thread-27884) com.aws.greengrass.shadowmanager.sync.model.CloudUpdateSyncRequest: Updating cloud shadow document. {thing name=home-XXXXX, shadow name=AirQuality, cloud-version=1310876, local-version=4059826}" the last CloudUpdateSyncRequest happens at 09:04: "2023-08-03T09:04:00.049Z [DEBUG] (pool-2-thread-27884) com.aws.greengrass.shadowmanager.sync.model.CloudUpdateSyncRequest: Updating cloud shadow document. {thing name=home-XXXXX, shadow name=AirQuality, cloud-version=1310893, local-version=4059844}" the update job completes at: "2023-08-03T09:09:27.728Z [INFO] (pool-2-thread-47) com.aws.greengrass.deployment.IotJobsHelper: Updating status of persisted deployment. {Status=SUCCEEDED, StatusDetails={detailed-deployment-status=SUCCESSFUL}, ThingName=home-XXXXX, JobId=6b087888-6826-448c-8191-5ea0436a3796}" we see the sync stop: "2023-08-03T09:09:28.019Z [WARN] (Thread-8) com.aws.greengrass.mqttclient.AwsIotMqtt5Client: Connection interrupted. {clientId=home-XXXXX#2, error=Mqtt5 client connection interrupted by user request.}" there after greengrass/shadowmanager is only updating the local shadow and nothing is being synced. "2023-08-03T09:10:34.456Z [INFO] (Thread-7) com.aws.greengrass.shadowmanager.ipc.UpdateThingShadowRequestHandler: Successfully updated shadow. {service-name=co.mgh.ShadowPortal, thing name=home-XXXXX, shadow name=AirQuality, local-version=4059849}" |
From the logs you shared, this is how the IoT Core connectivity is working for the client with id: clientId=home-XXXXX#2
The client |
But the device was never offline. Manually restarting the greengrass service resulted in shadows syncing with cloud again. I've tried this with 3 devices, all devices had internet connectivity the whole time. |
Yes, we're still investigating the root cause for why this is happening. That note was just for the context. Thank you for your patience. |
Thank you for your report. It appears that this is a bug in the Nucleus MQTT 5 client (which was introduced in 2.10) which is happening when you have more than 50 MQTT subscriptions (which spawns a second client) and then drop below 50 subscriptions (which closes the now unnecessary second client). We have a fix for this in #1511. The workarounds would be to remain on Nucleus version before 2.10 or configure syncing additional shadows such that there are consistently more than 50 subscriptions. You could also configure Nucleus 2.11 to use MQTT 3 client which would also avoid the problem. We're working on fully validating the fix and then releasing it as soon as we're able. |
thanks guys! |
We are releasing the fix over the next 24 hours. Stay tuned! |
Resolved with Nucleus 2.11.2 please let us know if that's working for you. |
Describe the bug
When upgrading from nucleus 2.9.6 to 2.11.1 every device i've tried so far stopped syncing shadows with the cloud. A manual restart of greengrass is required to bring back the sync.
To Reproduce
Upgrade device from 2.9.6 to 2.11.1 (also requires updating shadow manager from 2.3.1 to 2.3.3)
Expected behavior
Syncing resumes.
Actual behavior
No syncing, or errors.
Environment
Additional context
You can see in the logs that when the device is finished updating the connection gets interrupted, the syncing stops and never resumes.
The sync operations only involve the local shadow version and never mention 'cloud-version' subsequently.
Relevant log lines:
2023-08-03T10:11:38.797Z [INFO] (Thread-4) com.aws.greengrass.deployment.IotJobsHelper: Job status update was accepted. {Status=SUCCEEDED, ThingName=home-XXXXXXXXXXXX, JobId=60c75523-5e00-4c1c-ac1f-1c004bc17ba3}
2023-08-03T10:11:38.873Z [WARN] (Thread-8) com.aws.greengrass.mqttclient.AwsIotMqtt5Client: Connection interrupted. {clientId=home-XXXXXXXXXXXX#2, error=Mqtt5 client connection interrupted by user request.}
2023-08-03T10:11:38.874Z [INFO] (pool-2-thread-29) com.aws.greengrass.status.FleetStatusService: fss-status-update-published. Status update published to FSS. {trigger=THING_GROUP_DEPLOYMENT, serviceName=FleetStatusService, currentState=RUNNING}
2023-08-03T10:11:38.875Z [INFO] (Thread-8) com.aws.greengrass.shadowmanager.sync.strategy.BaseSyncStrategy: sync. Stop syncing. {}
2023-08-03T10:11:38.876Z [INFO] (Thread-8) com.aws.greengrass.shadowmanager.sync.strategy.BaseSyncStrategy: sync. Cancel 1 sync thread(s). {}
2023-08-03T10:11:38.876Z [INFO] (Thread-8) com.aws.greengrass.shadowmanager.sync.strategy.BaseSyncStrategy: sync. Waiting for in-flight sync requests to finish. {}
2023-08-03T10:11:38.876Z [INFO] (pool-2-thread-29) com.aws.greengrass.deployment.DeploymentDirectoryManager: Persist link to last deployment. {link=/greengrass/v2/deployments/previous-success}
2023-08-03T10:11:38.877Z [INFO] (pool-2-thread-29) com.aws.greengrass.deployment.DeploymentDirectoryManager: Clean up link to earlier deployment. {link=/greengrass/v2/deployments/previous-success}
2023-08-03T10:11:38.998Z [INFO] (Thread-7) com.aws.greengrass.shadowmanager.ipc.UpdateThingShadowRequestHandler: Successfully updated shadow. {service-name=co.mgh.ShadowPortal, thing name=home-XXXXXXXXXXXX, shadow name=Lighting, local-version=62}
2023-08-03T10:11:39.017Z [INFO] (Thread-7) com.aws.greengrass.shadowmanager.ipc.UpdateThingShadowRequestHandler: Successfully updated shadow. {service-name=co.mgh.ShadowPortal, thing name=home-XXXXXXXXXXXX, shadow name=Shading, local-version=94}
2023-08-03T10:11:39.015Z [INFO] (pool-2-thread-43) com.aws.greengrass.shadowmanager.ShadowManagerDAOImpl: Updating sync info. {thing name=home-XXXXXXXXXXXX, shadow name=Lighting, cloud-version=1577, local-version=61}
2023-08-03T10:11:39.032Z [INFO] (Thread-8) com.aws.greengrass.shadowmanager.sync.strategy.BaseSyncStrategy: sync. Finished waiting for sync requests to finish. {}
2023-08-03T10:11:39.039Z [WARN] (pool-2-thread-43) com.aws.greengrass.shadowmanager.sync.strategy.BaseSyncStrategy: sync. Interrupted while waiting for sync requests. {}
2023-08-03T10:11:39.041Z [INFO] (pool-2-thread-43) com.aws.greengrass.shadowmanager.sync.strategy.BaseSyncStrategy: sync. Finished processing sync requests. {}
2023-08-03T10:11:39.070Z [INFO] (Thread-7) com.aws.greengrass.shadowmanager.ipc.UpdateThingShadowRequestHandler: Successfully updated shadow. {service-name=co.mgh.ShadowPortal, thing name=home-XXXXXXXXXXXX, shadow name=Energy, local-version=69551}
2023-08-03T10:11:39.083Z [INFO] (Thread-7) com.aws.greengrass.shadowmanager.ipc.UpdateThingShadowRequestHandler: Successfully updated shadow. {service-name=co.mgh.ShadowPortal, thing name=home-XXXXXXXXXXXX, shadow name=Shading, local-version=95}
2023-08-03T10:11:39.097Z [INFO] (Thread-7) com.aws.greengrass.shadowmanager.ipc.UpdateThingShadowRequestHandler: Successfully updated shadow. {service-name=co.mgh.ShadowPortal, thing name=home-XXXXXXXXXXXX, shadow name=Climate, local-version=293258}
The text was updated successfully, but these errors were encountered: