Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConditionHandling stops working after some time #2288

Closed
Haeavar opened this issue Jul 12, 2024 · 6 comments
Closed

ConditionHandling stops working after some time #2288

Haeavar opened this issue Jul 12, 2024 · 6 comments
Labels
bug Something isn't working
Milestone

Comments

@Haeavar
Copy link
Contributor

Haeavar commented Jul 12, 2024

Describe the bug
We configure the ConditionHandling using following options.

        "ConditionHandling": {
          "updateInterval": 5,
          "snapshotInterval": 60
        }

This works after a fresh start of the OpcPublisher.
After some hours it stop sending ua-condtion messages even if there are retained events.

To Reproduce
Currently its not clear what causes the stop but it happens always after some hours.

Additional context
It seems, that the OpcPublisher 2.8.1 still worked without issues. At least with version 2.9.8 we see these issues. But it could be introduced in earlier versions.

@Haeavar
Copy link
Contributor Author

Haeavar commented Jul 12, 2024

Is it possible to use the DefaultMonitoredItemWatchdogCondition to monitor the ua-condition?
It would help, if there are no ua-condition since 1h, the publisher restarts. This would help us until the bug is fixed.

@marcschier marcschier added this to the 2.9.10 milestone Jul 12, 2024
@marcschier marcschier added the bug Something isn't working label Jul 12, 2024
@marcschier
Copy link
Collaborator

marcschier commented Jul 13, 2024

Question: is the stop related to a reconnect? If you look through the logs to the point it stops, a reconnect should be apparent.

@Haeavar
Copy link
Contributor Author

Haeavar commented Jul 15, 2024

@marcschier you are right. I see a reconnect in the log:

2024-07-13T03:10:12.310Z [24-07-13 03:10:12.3101] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] 2024-07-13T03:10:12.310Z Subscription a3cb91608fcab78fb8f1d76037c568ba155dea1d_0:727814108 STOPPED! 2024-07-13T03:10:14.309Z [24-07-13 03:10:14.3094] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] 2024-07-13T03:10:14.309Z Publish #28290, Reconnecting=False, Error: BadSecureChannelClosed

Since it is kind of a critical bug for us do you have an idea for a workaround until its fixed?

@Haeavar
Copy link
Contributor Author

Haeavar commented Jul 15, 2024

After some investigation it turns out, that this happens only for subscriptions to an aggregation OPC UA server. The events comes from an aggregated OPC UA server. If the aggregation server is disconnected (shutdown, restart or connection interrupt) and the OpcPublisher reconnected, all subscriptions are reestablished but the conditions are missing.
Only after removing the writer and recreate them, the conditions are send again.

@marcschier
Copy link
Collaborator

Looks like the issue is due to the condition timer staying disabled after reconnect, and the item in the subscription is in a state where it can not be reenabled again on the server, therefore the condition timer staying disabled, and even when the subscription is periodically resynchronized with the server (which should happen if there is one item not fully applied to the subscription). Right now I have no ready idea/fix here, nor good idea how to reproduce. The only thing that comes to mind is to recognize this in the watchdog condition and trigger the watchdog, a feature I would need to implement.

When you go through the logs, and you find where the items are "added", can you pinpoint any errors that apply to this condition item?

I would be happy to go through the logs, too, if you care to share them. You can share them through a gist you give me access to, or through MSFT support, or add a subset to the issue here.

@Haeavar
Copy link
Contributor Author

Haeavar commented Jul 16, 2024

You were right. There is indeed an bad monitored node:

2024-07-16T07:38:01.407Z [24-07-16 07:38:01.4071] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0]
2024-07-16T07:38:01.407Z 1e7f323b0fe01c49335c92d90fe6a8af1e4eb7ed_0:727814210 - Now monitoring 1 nodes:
2024-07-16T07:38:01.407Z # Good/Bad: 0/1
2024-07-16T07:38:01.407Z # Reporting: 1
2024-07-16T07:38:01.407Z # Sampling: 0
2024-07-16T07:38:01.407Z # Disabled: 0
2024-07-16T07:38:01.407Z # Not applied: 0
2024-07-16T07:38:01.407Z # Removed: 2
2024-07-16T07:38:01.407Z [24-07-16 07:38:01.4071] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0]
2024-07-16T07:38:01.407Z Issuing ConditionRefresh on subscription 1e7f323b0fe01c49335c92d90fe6a8af1e4eb7ed_0:727814210

This is the part of the log where the connection is lost and reconnected.
Logs_sfh-229_app-62_opcpublisher (4).txt

marcschier added a commit that referenced this issue Jul 27, 2024
…2297)

* Update nuget dependencies
* Publish Start/stop tests
* Fix seconds heartbeat bug #2292 
* Fix condition handling stop working and heartbeat timer stopping on reconnect #2288 
* Add ipi option #2299 
* Session per writer #2298 
* Setting ska and slt options to default to 0 #2294 
* Add auto calculation of qs. #2300 
* Document browse path formats and fix issue when node id is missing #2296 
* API and sample to dump session, channel and subscription diagnostics from server #2303
* Better validation for missing node id on inputs to the configuration API.
@marcschier marcschier reopened this Oct 18, 2024
@marcschier marcschier modified the milestones: 2.9.10, 2.9.12 Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants