-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snuba-subscription-consumer-* containers are failing continuously #6137
Comments
I think this may be a duplicate of #5855 (comment) |
Any updates on this? I was able to fix snuba-subscription-consumer-transactions by recreating the corresponding topic but for snuba-subscription-consumer-metrics , that did not work |
Is this simply attributed to Sentry currently not polling this data as it's not needed? Hence kafka states no active members? Still the crashloop should be fixed. UPDATE: |
Please refer to the issue comment I linked above and ensure you are not running the commit-log topic with more than one partition. |
I have/had all commit-log topics set to 1 partition and 1 replica all along as well have them all defined in the topic_partition_counts and all set to 1. This problem became apparent when I was doing some regular operations like editing the topic_partition_counts and updating basic configs. I can't get passed this events is fine metrics and transactions is not. |
Sorry logged in another account |
@untitaker , is this at all related to removal of beta metrics feature from sentry? We recently did remove the metrics beta feature . I wonder if some components can now be removed from deployment... although I see the metric topics are still having data sent to them. what are these exactly responsible for ?
|
@chipzzz metrics is for release health (crashed sessions etc in releases tab), generic-metrics is for the beta metrics feature you mention, transactions is for Performance product in general, events is for errors generally, deployments with "subscription" in the name are for alerts. if you don't need alerts on crashed sessions/performance data/errors respectively, you can just remove those deployments if you have further questions like this I suggest filing a separate issue from this one, which should be focused on the bugs IMO |
@untitaker , Am still using the aforementioned, except beta metrics. Unclear though what else could be causing this. |
This may be related to this |
For reference including #2666 |
This was removed #3623 but still referenced here https://github.com/getsentry/snuba/blob/24.7.1/snuba/subscriptions/scheduler_processing_strategy.py#L210 |
Resolved the issue. These consumers
Also depend on other topics and not just
In my UAT environment I had an increased number of partitions for these topics but did not have a matching number of consumers to consume from all partitions, hence the key error. So you must have a matching number of partitions to corresponding consumers consuming them. I tested with other topics/consumers but was not aware snuba-metrics, transactions, events topic were also associated. However, I am not sure how this problem became apparent as It was always set up this way. |
Self-Hosted Version
23.11.2
CPU Architecture
x86_64
Docker Version
NA
Docker Compose Version
NA
Steps to Reproduce
Seeing following containers been crashing continuously. Is this services used for alerting ? Have little confusions now on the services functionality.
Logs:
Alerting system were working fine. We made few changes with kafka partitions after that we saw only these 3 containers were down.
ingest-events
andevents
from 1 to 5 for scale testingHave little confusions now on these services functionality. Which service is now serving the alerting ?
Suspecting some issue with partition mis-match(please do correct us if this is not related to it), so have increased all the topics partition to 5. Currently review all topic configs, seeing these 3 topics snuba-commit-log, events-subscription-results and ingest-monitors having a ReplicationFactor of 3 rest all topic is having ReplicationFactor as 1, remaining all configs remains same now.
Also while listing out consumer-groups seeing following having no active members
Do let us know if any other information needed.
Expected Result
NA
Actual Result
NA
Event ID
No response
The text was updated successfully, but these errors were encountered: