Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device FW update fails due to incorrect determination of OTA server, cluster and endpoint #312

Closed
hanneshofmann opened this issue Nov 30, 2018 · 8 comments · Fixed by #319

Comments

@hanneshofmann
Copy link

hanneshofmann commented Nov 30, 2018

I just tried to update the device firmware of a thing and the firmware update failed repeatedly using the following devices:

  • Bitron Video AV2010/22A [V14 -> V15]
  • CentraLite 3328-D [V18005310 -> V1F0B5310]
  • BitronHome 902010/25 [V30 -> V40]

When checking the logs, I see that the firmware image transmission has not been even started and the binding claims that the OTA server could not be created for the respective device:

[symbolicName=org.openhab.binding.zigbee, className=org.openhab.binding.zigbee.handler.ZigBeeThingHandler] - 000D6F000BBCA4C7: Can't create OTA server

I believe there is a simple bug in the ZigBeeThingHandler#updateFirmware how the OTA server, endpoint and cluster are determined since it relies on a specific ordering of the endpoints (e.g. if the first endpoint provides the OTA server, the OTA cluster and the OTA endpoint will never be initialized).

@cdjackson What do you think? Did I miss something? If you want, I can contribute a fix for that.

@cdjackson
Copy link
Contributor

if the first endpoint provides the OTA server, the OTA cluster and the OTA endpoint will never be initialized

I don’t understand why this wouldn’t work if the OTA server was in the first endpoint? The binding checks all endpoints, and I don't think it matters if it is the first?

        for (ZigBeeEndpoint endpoint : node.getEndpoints()) {
            otaServer = (ZclOtaUpgradeServer) endpoint.getApplication(ZclOtaUpgradeCluster.CLUSTER_ID);
            if (otaServer != null) {
                break;
            }

            otaCluster = (ZclOtaUpgradeCluster) endpoint.getOutputCluster(ZclOtaUpgradeCluster.CLUSTER_ID);
            if (otaCluster != null) {
                otaEndpoint = endpoint;
                break;
            }
        }

        if (otaServer == null && otaCluster == null) {
            logger.debug("{}: Can't find OTA cluster", nodeIeeeAddress);
            return;
        }

This code loops through all endpoints. It looks to see if the OTA server is already registered, and if it is, it breaks (keeping a reference to the server). If it doesn't already have a server registered, it checks to see if the cluster is supported - if it is, it breaks. This should find the first endpoint that contains the OTA server cluster.

In the end, either the otaServer or the otaCluster should be not null if OTA is supported - ie if they are both null, then OTA is not supported.

If you have identified a bug here, then please can you be specific as I'm not immediately seeing it (maybe it's the jetlag!)? Alternatively, can you supply the log, and more importantly, the XML file for the network.

@cdjackson
Copy link
Contributor

I misread the error you reported, so the next bit of code is also relevant, and this is the error you report -:

        // Register the OTA server if it's not already registered
        if (otaServer == null && otaEndpoint != null) {
            otaServer = new ZclOtaUpgradeServer();
            otaEndpoint.addApplication(otaServer);
        } else {
            logger.debug("{}: Can't create OTA server", nodeIeeeAddress);
            return;
        }

However I don't think that changes anything - the above for loop should exit knowing either otaServer if the application is already registered, or otaEndpoint if it's not registered. If it's not registered, this is created, and added to the endpoint.

Clearly I'm missing something obvious that you have spotted, but I can't see unfortunately at the moment.

Further down, there may be an issue with the setting of the cluster, but based on your reported error, only the above code is applicable as it exits at the above log entry.

@hanneshofmann
Copy link
Author

Thanks for your quick feedback, @cdjackson.

However I don't think that changes anything - the above for loop should exit knowing either otaServer if the application is already registered, or otaEndpoint if it's not registered. If it's not registered, this is created, and added to the endpoint.

I believe that is exactly the problem. After the loop either the otaServer, or the otaEndpoint is known. If the otaServer is known before the otaEndpoint (means that the application is already registered), the otaEndpoint will remain null after the loop and thus, the following check prevents the firmware update:

        // Register the OTA server if it's not already registered
        if (otaServer == null && otaEndpoint != null) { 
            otaServer = new ZclOtaUpgradeServer();
            otaEndpoint.addApplication(otaServer);
        } else {
            logger.debug("{}: Can't create OTA server", nodeIeeeAddress);
            return;
        }

If there is no specific reason, I think we should not break the loop as soon as the otaServer or the otaEndpoint has been found?

@cdjackson
Copy link
Contributor

The loop needs to break - otherwise I think we will add a new ZclUpgradeServer into another endpoint. Only one, or the other (ie the application or the endpoint) needs to be found at this point.

You are right that this check doesn't allow otaServer != null through, so I think we just need to change to -:

        if (otaServer == null && otaEndpoint != null) { 
            otaServer = new ZclOtaUpgradeServer();
            otaEndpoint.addApplication(otaServer);
        } else if (otaServer == null) {
            logger.debug("{}: Can't create OTA server", nodeIeeeAddress);
            return;
        }

This should then (nearly!) work I think.

I note there is another bug further down due to cluster not being set which I will fix later once the library is updated following zsmartsystems/com.zsmartsystems.zigbee#443. I think this is a better way as once the application has been created, we shouldn't need to mess with the cluster directly.

@cdjackson
Copy link
Contributor

In fact, I think this can be simplified to -:

        if (otaServer == null && otaEndpoint == null) {
            logger.debug("{}: Can't find OTA cluster", nodeIeeeAddress);
            return;
        }

        if (otaServer == null) {
            // Register the OTA server as it's not already registered
            otaServer = new ZclOtaUpgradeServer();
            otaEndpoint.addApplication(otaServer);
        }

If both the server and endpoint are null, then we don't know OTA. If server is null, then we just create the server and add it to the endpoint which can't at that point be null due to the first check above.

@hanneshofmann
Copy link
Author

I can confirm that your proposal should fix it. With this small change in place, I was able to start the firmware image transfer for some devices. Shall I open a PR?

The firmware image transfer seem to start now but the last ImageBlockResponse transaction always has a timeout (tested with different devices). Due to the timeout, the binding aborts the firmware update. However, the UploadEndRequest is received by the binding after few minutes. Do you think if this behaviour might relate to zsmartsystems/com.zsmartsystems.zigbee#443? If not, I would like to open an additional issue for that.

@cdjackson
Copy link
Contributor

I've just created a PR for this. In your test, did you fix the uninitialised otaCluster?

The other PR will not impact the transfer so if there are timeouts, then please open a new issue and provide the logs there.

@hanneshofmann
Copy link
Author

In my local tests, I have only adapted the check to allow that the otaServer is not null:

        if (otaServer == null && otaEndpoint != null) { 
            otaServer = new ZclOtaUpgradeServer();
            otaEndpoint.addApplication(otaServer);
        } else if (otaServer == null) {
            logger.debug("{}: Can't create OTA server", nodeIeeeAddress);
            return;
        }

Further, I will retest the firmware update with the purposed fix in #319. Depending on the test results, I will create a new issue if the timeouts still occur or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants