Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Item was disposed or moved to another subscription #2317

Closed
steffenbeermann opened this issue Aug 8, 2024 · 11 comments
Closed

Item was disposed or moved to another subscription #2317

steffenbeermann opened this issue Aug 8, 2024 · 11 comments
Labels
bug Something isn't working
Milestone

Comments

@steffenbeermann
Copy link

Describe the bug
We have a OPC Publisher with about 24 Endpoints with around 2000-3000 Nodes each. We just updates to version 3.9.10 and facing the issue that after a couple of hours the OPC Publisher removed almost all nodes of the endpoints with the logs stating:
24-08-07 15:35:27.9796] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=...' with server id 1 (created): Item was disposed or moved to another subscription

We can see this error message a lot in the logs, since the issue occurred it was written for the same nodes every hour.

This affected 22 of the endpoints where for most of them, all nodes where disposed and for some only about 60% were disposed.
We athough noticed this happend in two waves. Yesterday at 17:30 6 Endpoints went down and after about 2 hours the other 16 were affected.

After restarting the publisher, all connections were established again and working fine for now. It seems that this is not an issue with the OPC Servers or Nodes because with a restart everything works fine again

To Reproduce
Publisher Config:
{ "Hostname": "publisher_axh_active", "Cmd": [ "--cl=5", "--cf", "--aa", "--pf=/mount/publishednodes.json", "--PkiRootPath=/mount/pki", "--bs=100", "--di=3600" ], "HostConfig": { "Binds": [ "/home/edge/OPCPublisherMount_Active:/mount" ] } }

Version used: 3.9.10

Expected behavior
The nodes are not disposed.

Additional context
Part of the logs:
em.DataChange[0] Data Item 'ns=2;s=RO_KW_AGGREGAT_NL.1_3MFB40_EA700.in.betriebsart.value' with server id 205 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_MELDEKONTAKT.1_3MFB40_EA801_YA03.in.kontakt.value' with server id 206 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA801_XM54.in.kontakt.value' with server id 207 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_MELDEKONTAKT.1_3MFB40_EA802_YA03.in.kontakt.value' with server id 208 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA802_XM54.in.kontakt.value' with server id 209 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_MELDEKONTAKT.1_3MFB40_EA811_YA03.in.kontakt.value' with server id 210 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA811_XM54.in.kontakt.value' with server id 211 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_MELDEKONTAKT.1_3MFB40_EA812_YA03.in.kontakt.value' with server id 212 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA812_XM54.in.kontakt.value' with server id 213 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA851_XM54.in.kontakt.value' with server id 214 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA852_XM54.in.kontakt.value' with server id 215 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA862_XM54.in.kontakt.value' with server id 216 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA861_XM54.in.kontakt.value' with server id 217 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA871_XM54.in.kontakt.value' with server id 218 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_ALARMKONTAKT.1_3MFB40_EA872_XM54.in.kontakt.value' with server id 219 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_AGGREGAT_NL.1_3MFB40_EA710.in.zustand0.value' with server id 220 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_AGGREGAT_NL.1_3MFB40_EA710.in.zustand1.value' with server id 221 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_AGGREGAT_NL.1_3MFB40_EA710.in.aktive_bedienebene.value' with server id 222 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_AGGREGAT_NL.1_3MFB40_EA710.in.modus.value' with server id 223 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_SOLLWERT.1_3MFB41_AP100_ZE11.out.sollwert.value' with server id 224 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_SOLLWERT.1_3MFB41_AP200_ZE11.out.sollwert.value' with server id 225 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_AGGREGAT_NL.1_3MFB40_EA711.in.zustand0.value' with server id 226 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_AGGREGAT_NL.1_3MFB40_EA711.in.zustand1.value' with server id 227 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CL001_XQ01.in.ausfall.value' with server id 228 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CL001_XQ01.in.minalarm.value' with server id 229 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CL001_XQ01.out.gw_minalarm.value' with server id 230 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CL001_XQ01.in.tief.value' with server id 231 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CL001_XQ01.out.gw_tief.value' with server id 232 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CL001_XQ01.in.wert.value' with server id 233 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s={827D3700-C4FF-41A9-884F-B49DA06097ED}' with server id 1 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CT001_XQ01.in.ausfall.value' with server id 234 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CT001_XQ01.in.meldung1.value' with server id 235 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s={82218900-CF04-457D-89AA-4658952B6D64}' with server id 2 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s={101CE900-4D46-4D4B-A828-84D575C85DEF}' with server id 3 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s={D3E10101-CD29-445D-89EB-3D5089A7F197}' with server id 4 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9835] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataChange[0] Data Item 'ns=2;s=RO_KW_MESSWERT.1_3MFB41_CT001_XQ01.in.meldung2.value' with server id 236 (created): Item was disposed or moved to another subscription [24-08-07 15:35:27.9847] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] 1fdf3dbee7ab3b7700eb7c690aab5a41af407714_0:3762709867 - Now monitoring 155 nodes: # Good/Bad: 155/0 # Reporting: 155 # Sampling: 0 # Disabled: 0 # Not applied: 0 # Removed: 0 [24-08-07 15:35:27.9849] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] 989ca92c8a770ee4b215750a5e242542683c1ca9_0:3762709869 - Now monitoring 52 nodes: # Good/Bad: 52/0 # Reporting: 52 # Sampling: 0 # Disabled: 0 # Not applied: 0 # Removed: 0

@marcschier
Copy link
Collaborator

marcschier commented Aug 8, 2024

@steffenbeermann - please could you try mcr.microsoft.com/iotedge/opc-publisher:2.9.11-preview1, which should address this issue.

@marcschier marcschier added the bug Something isn't working label Aug 8, 2024
@marcschier marcschier added this to the 2.9.11 milestone Aug 8, 2024
@marcschier
Copy link
Collaborator

Background - this happens during subscription reset where the old monitored items are not properly removed from the subscription and then remain in a disposed state. Data should flow, but heartbeats and other timers are off and not firing anymore.

@steffenbeermann
Copy link
Author

Thank you for the quick response! To understand the issue: Why are the subscriptions reset in the first place? I noticed that every hour the subscriptions were recreated, eg:
[24-08-07 15:05:45.2384] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] 4e7e56282acc83bbaae502cca831be66e4365a23_3:2124724734 - Now monitoring 1000 nodes: # Good/Bad: 969/31 # Reporting: 1000 # Sampling: 0 # Disabled: 0 # Not applied: 0 # Removed: 0

In our case, we sadly hadn't any dataflow anymore for the affected subscriptions?

@marcschier
Copy link
Collaborator

The message you are seeing there happens as a result of a) when a session recovery/reconnect sequence was run and produced either a new session or the old one. Then the subscription state in the session is synchronized which if it is new creates the monitored items - or - b) the subscription itself ran into an issue e.g. the subscription lifetime expired on the server and status change notification returned, or keep alives were missing on our side - then the subscription in a running session is reset (closed and recreated). Then the items should have been recreated, but the bug we had was that the existing (now disposed items) are not even touched anymore. But they are reported now as "bad". We fixed this in 2.9.11.

This should all be prominent in the log. If you can try 2.9.11-preview1 that would help me a lot.

You also see this message you pasted if applying one or more monitored items failed before or on the general management timer. Typically nothing is done to the subscription or items at this point though if all is working fine.

@steffenbeermann
Copy link
Author

Thanks for the clarification!
I upgraded to 2.9.11-preview1 and so far everything looks good. I will monitor the logs and will give an update tomorrow!

@steffenbeermann
Copy link
Author

steffenbeermann commented Aug 9, 2024

It seems like 3.9.11-preview11 helped with the issue of Item was disposed or moved to another subscription. But we had one endpoint with issues where I am not to sure what happend and if it is related to the new publisher version. Here are the findings:

  • yesterday at around 17:36 we did not receive any new data for the endpoint.
  • The Publisher metric iiot_edge_publisher_is_connection_ok for the endpoint was still sending 1 (so connection is ok)
  • The Publisher metric iiot_edge_publisher_data_changes_per_second went down to zero
  • After restarting the publisher, the endpoint worked again.
  • We have two separate publisher modules listening on the endpoint (both Version 3.9.11-preview1) and it failed for both at around the same time.
  • It seems like we had a network issue from the edge device to the opc ua servers because we can see that for a lot of the connections the error:
    [24-08-08 17:36:59.8927] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0] opc.tcp://***4980_6A0BCD90 [state:Connecting|refs:5]: Publish error: BadSecureChannelClosed 'BadSecureChannelClosed' (Actively handled: False)...
  • it seems that the connections could recover them self after a one minute (we are checking with our network team, if there was an issue on their side). But the connection to the failed endpoint did not seem to be affected by this because there are no "BadSecureChannelClosed" logs for the failed endpoint
  • But the logs does not mention the failed endpoint, but I found these logs after the network issue, where subscription id matches with the one of the endpoint reported by the metric iiot_edge_publisher_monitored_items:
    [24-08-08 17:36:49.9598] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] #2/30: Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:15 is missing keep alive. [24-08-08 17:36:49.9598] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] #2/30: Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_1:14 is missing keep alive. [24-08-08 17:36:49.9598] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] #2/30: Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_0:13 is missing keep alive.
  • and bit after that:
    [24-08-08 17:36:59.9693] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_1:14 STOPPED! [24-08-08 17:36:59.9694] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:15 STOPPED! [24-08-08 17:36:59.9694] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_0:13 STOPPED!
  • after this logs, I could not find any mentions of the subscriptions anymore.

@marcschier
Copy link
Collaborator

marcschier commented Aug 9, 2024

Thank you for the details @steffenbeermann, when the subscription is STOPPED, do you see session keep alive timeout logs as well (this is the way in OPC UA connection drops are discovered). If the channel is closed, the session keep alive check should fail and trigger a reconnect. The keep alive error and reconnect should be visible in the logs,

 ... [state:Ready|refs:<n>]: Got Keep Alive error: ...

If not it would be great if you could share (e.g., via github gist or support team the full logs).

Another important diagnostics info is the number of publish requests in progress, it is one of the metrics at the top of the diagnostic output to console (if --di is not disabled) like this:

 # Queued/Minimum request count       :              4 | 2
 # Good/Bad Publish request count     :              2 | 2

What server is this connected to?

@steffenbeermann
Copy link
Author

We have two Publisher Modules that connect to the same endpoints. In one of them I could not find the Got Keep Alive error, but this could happen because the save the logs with an custom module and with this there is a chance that when a lot of logs are written some logs get lost....

In the second module log I found some, here are the logs related to the failed endpoint and I maybe found some usefull logs:
[24-08-08 17:36:58.3624] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0] opc.tcp://0.0.0.0:4840_116A0CAD_KWT [state:Ready|refs:3]: Got Keep Alive error: BadNoCommunication 'Server not responding to keep alive requests.' (08/08/2024 17:36:58:BadNoCommunication 'Server not responding to keep alive requests.' [24-08-08 17:36:58.3645] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0] opc.tcp://0.0.0.0:4840_116A0CAD_KWT [state:Ready|refs:3]: Got Keep Alive error: BadNoCommunication 'Server not responding to keep alive requests.' (08/08/2024 17:36:58:BadNoCommunication 'Server not responding to keep alive requests.' ... 24-08-08 17:36:58.3651] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0] opc.tcp://0.0.0.0:4840_116A0CAD_KWT [state:Ready|refs:3]: Reconnecting session opc.tcp://0.0.0.0:4840_116A0CAD_KWT due to error BadNoCommunication 'Server not responding to keep alive requests.'... [24-08-08 17:36:58.3651] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0] opc.tcp://0.0.0.0:4840_116A0CAD_KWT [state:Connecting|refs:3]: Session opc.tcp://0.0.0.0:4840_116A0CAD_KWT with opc.tcp://0.0.0.0:4840 changed from Ready to Connecting ... [24-08-08 17:36:59.2681] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0] opc.tcp://0.0.0.0:4840_116A0CAD_KWT [state:Ready|refs:3]: Session opc.tcp://0.0.0.0:4840_116A0CAD_KWT with opc.tcp://0.0.0.0:4840 changed from Connecting to Ready [24-08-08 17:36:59.2682] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0] opc.tcp://0.0.0.0:4840_116A0CAD_KWT [state:Ready|refs:3]: Client RECOVERED! [24-08-08 17:36:59.2769] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] Publish #93680, Reconnecting=False, Error: BadNoSubscription [24-08-08 17:36:59.2857] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] Publish #93681, Reconnecting=False, Error: BadNoSubscription [24-08-08 17:36:59.2915] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] e8586836593ba92624a1f0b6516d1bcf11d4b67c_0:16 - Now monitoring 1000 nodes: # Good/Bad: 1000/0 # Reporting: 1000 # Sampling: 0 # Heartbeat/ing: 0/0 # Condition/ing: 0/0 # Disabled: 0 # Not applied: 0 # Removed: 0 [24-08-08 17:36:59.2952] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] e8586836593ba92624a1f0b6516d1bcf11d4b67c_1:17 - Now monitoring 1000 nodes: # Good/Bad: 1000/0 # Reporting: 1000 # Sampling: 0 # Heartbeat/ing: 0/0 # Condition/ing: 0/0 # Disabled: 0 # Not applied: 0 # Removed: 0 [24-08-08 17:36:59.2982] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:18 - Now monitoring 757 nodes: # Good/Bad: 757/0 # Reporting: 757 # Sampling: 0 # Heartbeat/ing: 0/0 # Condition/ing: 0/0 # Disabled: 0 # Not applied: 0 # Removed: 0 [24-08-08 17:36:59.2984] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0] opc.tcp://0.0.0.0:4840_116A0CAD_KWT [state:Ready|refs:3]: Applying changes to 3 subscription(s) took 00:00:00.0301485. [24-08-08 17:36:59.3074] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] Publish #93682, Reconnecting=False, Error: BadNoSubscription [24-08-08 17:36:59.3157] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] Publish #93683, Reconnecting=False, Error: BadNoSubscription [24-08-08 17:36:59.3630] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] Publish #27560, Reconnecting=True, Error: BadSecureChannelClosed ... [24-08-08 17:37:59.2879] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] #5/30: Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_0:16 is missing keep alive. [24-08-08 17:37:59.2920] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] #5/30: Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_1:17 is missing keep alive. [24-08-08 17:37:59.2961] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] #5/30: Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:18 is missing keep alive. ... [24-08-08 17:43:11.2889] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Trigger management of subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_0:16... [24-08-08 17:43:11.2923] crit: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] #31/30: Keep alive count exceeded. Perform Reset for e8586836593ba92624a1f0b6516d1bcf11d4b67c_1:17... [24-08-08 17:43:11.2923] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Trigger management of subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_1:17... [24-08-08 17:43:11.2964] crit: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] #31/30: Keep alive count exceeded. Perform Reset for e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:18... [24-08-08 17:43:11.2964] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Trigger management of subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:18... ... 24-08-08 17:43:12.7730] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Enabled Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_1:20 in session opc.tcp://0.0.0.0:4840_116A0CAD_KWT. [24-08-08 17:43:12.7828] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] ======== Closing subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:18 and re-creating ========= [24-08-08 17:43:12.8242] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Subscription 'e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:0' closed. [24-08-08 17:43:12.8242] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Creating new disabled subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:0 in session opc.tcp://0.0.0.0:4840_116A0CAD_KWT. [24-08-08 17:43:12.8329] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Successfully created subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:21'. Actual (revised) state/desired state: # PublishingEnabled False/False # PublishingInterval 1000/1000 # KeepAliveCount 10/0 # LifetimeCount 30/0 [24-08-08 17:43:13.3566] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:21 - Now monitoring 757 nodes: # Good/Bad: 757/0 # Reporting: 757 # Sampling: 0 # Heartbeat/ing: 0/0 # Condition/ing: 0/0 # Disabled: 0 # Not applied: 0 # Removed: 0 [24-08-08 17:43:13.3651] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0] Enabled Subscription e8586836593ba92624a1f0b6516d1bcf11d4b67c_2:21 in session opc.tcp://10.166.73.224:4840_116A0CAD_KWT.

Because there are some additional informations like node names and so on, I need to check if I can upload the full logs. I will give an update on monday.

I could not find the diagnostic info, even the start up is with '--dii==3600'? But we collect all metrics via the metric collector. For iiot_edge_publisher_good_nodes / iiot_edge_publisher_bad_nodes there were no changes in the timespan of the issue. But I am not sure if this is the same? I could not find the corresponding metrics to Queued/Minimum request count?

It connects to an OPC UA Server that is provided by a power plant provider. Sadly I don't know which kind of software they are using to provide the OPC UA Server.

@marcschier
Copy link
Collaborator

marcschier commented Aug 9, 2024

You can set --di=60 and there will be metrics printed to the console every minute. But it is ok what you are doing. I could not see anything concerning above. Don't share - it is ok. What is likely the key piece is

zure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] Publish #93682, Reconnecting=False, Error: **BadNoSubscription** [24-08-08 17:36:59.3157] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] Publish #93683, Reconnecting=False, Error: **BadNoSubscription** [24-08-08 17:36:59.3630] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0] Publish #27560, Reconnecting=True, Error: **BadSecureChannelClosed** ... [24-08-08 17:37:59.2879] info: 

You could test 2.9.11-preview2 which I just released which will re-enable the quirk that resets the session when the publish requests return BadNoSession or BadInvalidSessionId. Because of a reported interop issue, I went through a lot of reconnect testing using a network shaper yesterday and today and validated that with TCP RST or tampered packets the publisher always recovered, so it is possibly a server vendor issue, too, just like in the reported case.

@steffenbeermann
Copy link
Author

okay, I will test 2.9.11-preview2. But over the weekend it ran without problems with 2.9.11-preview1 as well.

@marcschier
Copy link
Collaborator

Thanks for testing and confirming, I will close, please let me know if you encounter issues by either commenting here or open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants