-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Icestorm: Max Subscriptions at 2048 for single topic? #1261
Comments
The segfault could indeed be caused by the lack of synchronization for the tracing, thanks for finding this one! The best way to confirm this would be to get a core dump of the crash and see where it occurs. It would be good to figure out why you have such a high number of subscribers. Are you making sure to unsubscribe subscribers when they terminate gracefully or no longer need the subscription? For subscribers that don't unsubscribe (possibly because the subscriber crashed), they should eventually be automatically reaped if they are no longer reachable. The subscriber RetryCount QoS controls when these are reaped, see https://doc.zeroc.com/ice/3.7/ice-services/icestorm/icestorm-quality-of-service#id-.IceStormQualityofServicev3.7-RetryCountQoSforIceStorm Are you perhaps setting retryCount to -1? You could enable subscriber tracing to trace when subscribers are added/removed and when the retries occur (with the .Trace.Subscriber property, see https://doc.zeroc.com/ice/3.7/property-reference/icestorm-properties#id-.IceStormPropertiesv3.7-service.Trace.Subscriber) Another option would be to enable metrics and check the number of subscribers with the IceGrid GUI using the metrics functionality (see https://doc.zeroc.com/ice/3.7/ice-services/icegrid/icegrid-gui-tool) |
We do see regular purges of dropped subscribers, so I believe our system is correctly working. The topic in question is our most subscribed to topic and every single app and service that runs in our system will have 1 subscription (singleton based), so it isn't surprising that this topic would have 2048 or more subscriptions at any one time in the regular course of a day. We will try and setup our servers so we can retrieve these core dumps and then hopefully provide more information. |
We recently had a segfault on icestorm, and luckily we had trace logs on, but we did lose the core dump (machine didn't have enough disk to store it).
segfault at 4030 ip 00007ffff7184f48 sp 00007fff9a7fac20 error 4 in libstdc++.so.6.0.19[7ffff70c6000+e9000]
After analyzing the trace logs i noticed that the last valid log was adding a subscription to one topic and it had exactly 2048 current subscriptions. This number seemed highly suspect, so i started looking into the configs and code for any setting that might correlate, but was not able to find anything.
We're running 3.6.2 version of the libraries in a cluster of 3 nodes running on centos 7.3. Our icestorm configs are as follows:
We're going to try updating the Ice.MessageSizeMax in case there is some internal messaging between the nodes that is also using this config, but we're not sure this is actually the cause. We've also had other segfaults more recently that have not recently had 2048 subscribers at the time. These might be attributed to this pull request, but that is much of a shot in the dark: #1259, #1260
Any help pointing towards where this issue might lie, would be greatly appreciated. We're having multiple segfaults about once every 3 weeks or so, some auto recover (we try and restart the service if it goes down) most require a full restart.
The text was updated successfully, but these errors were encountered: