Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snuba_subscription_consumers KeyError #5514

Closed
sn0wk4t opened this issue Feb 9, 2024 · 5 comments
Closed

snuba_subscription_consumers KeyError #5514

sn0wk4t opened this issue Feb 9, 2024 · 5 comments

Comments

@sn0wk4t
Copy link

sn0wk4t commented Feb 9, 2024

Environment

I'm running snuba workers on different VMs in my sentry deployment (version 23.6.2).
My kafka deployment consist in a 3-node cluster (6 partitions on each topic)

Snuba ENV

CLICKHOUSE_HOST=10.14.12.210
DEFAULT_BROKERS=10.14.12.201:9092,10.14.12.69:9092,10.14.12.126:9092
REDIS_HOST=10.14.12.239
SNUBA_SETTINGS=self_hosted
TOPIC_PARTITION_COUNTS = {"event-replacements": 6, "events": 6, "events-subscription-results": 6, "ingest-replay-events": 6, "ingest-sessions": 6, "outcomes": 6, "processed-profiles": 6, "profiles-call-tree": 6, "snuba-commit-log": 6, "snuba-sessions-commit-log": 6, "snuba-transactions-commit-log": 6, "transactions-subscription-results": 6}

Steps to Reproduce

  1. partitioned each kafka topic by 6
  2. run 3 instances of snuba_subscription_consumer_event worker (subscriptions-scheduler-executor --dataset events --entity events --auto-offset-reset=latest --no-strict-offset-reset --consumer-group=snuba-events-subscriptions-consumers --followed-consumer-group=snuba-consumers --delay-seconds=60 --schedule-ttl=60 --stale-threshold-seconds=900 --log-level=debug)
  3. partitioned each topic by 6

Expected Result

Worker runs.

Actual Result

Worker keeps restarting with this kind of error each time

2024-02-09 12:17:38,476 Initializing Snuba...
2024-02-09 12:17:42,935 Snuba initialization took 4.4598174672573805s
2024-02-09 12:17:43,686 Initializing Snuba...
2024-02-09 12:17:48,684 Snuba initialization took 4.998328043147922s
2024-02-09 12:17:48,715 Starting
2024-02-09 12:17:48,801 New partitions assigned: {Partition(topic=Topic(name='snuba-commit-log'), index=0): 7975, Partition(topic=Topic(name='snuba-commit-log'), index=1): 960, Partition(topic=Topic(name='snuba-commit-log'), index=2): 631}
2024-02-09 12:17:48,801 Initialized processing strategy: <snuba.subscriptions.scheduler_processing_strategy.TickBuffer object at 0x7fd0e4d1d790>
2024-02-09 12:17:50,270 Caught exception, shutting down...
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
    self._run_once()
  File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 381, in _run_once
    self.__processing_strategy.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
    self.__next_step.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
    tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 4
2024-02-09 12:17:50,271 Terminating <snuba.subscriptions.scheduler_processing_strategy.TickBuffer object at 0x7fd0e4d1d790>...
2024-02-09 12:17:50,271 Closing <snuba.subscriptions.scheduler_consumer.CommitLogTickConsumer object at 0x7fd0e4652be0>...
2024-02-09 12:17:50,272 Partitions to revoke: [Partition(topic=Topic(name='snuba-commit-log'), index=0), Partition(topic=Topic(name='snuba-commit-log'), index=1), Partition(topic=Topic(name='snuba-commit-log'), index=2)]
2024-02-09 12:17:50,272 Partition revocation complete.
2024-02-09 12:17:50,275 Processor terminated
Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 33, in <module>
    sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/src/snuba/snuba/cli/subscriptions_scheduler_executor.py", line 141, in subscriptions_scheduler_executor
    processor.run()
  File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
    self._run_once()
  File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 381, in _run_once
    self.__processing_strategy.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
    self.__next_step.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
    tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 4
@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Feb 9, 2024
@untitaker
Copy link
Member

the subscriptions-scheduler-executor command is only able to run with 1 replica. it is supposed to be used for small-scale deployments.

if you want to scale scheduler and executor, you need to run subscriptions-scheduler and subscriptions-executor separately. the scheduler still needs to run as 1 single pod, but the executor can be scaled horizontally then

we can look into splitting up those docker containers further, but it would mean more complexity for smalll-scale deployments.

@sn0wk4t
Copy link
Author

sn0wk4t commented Feb 14, 2024

Hello @untitaker , thanks for the response
So to scale this worker I have to run a standalone subscriptions-scheduler and many subscriptions-executors. Clear!

Can the subscriptions-scheduler handle multiple topic partitions? Do I have to shrink the number of partitions of the snuba-commit-log topic down to 1?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Feb 14, 2024
@lynnagara
Copy link
Member

Hello @untitaker , thanks for the response So to scale this worker I have to run a standalone subscriptions-scheduler and many subscriptions-executors. Clear!

Can the subscriptions-scheduler handle multiple topic partitions? Do I have to shrink the number of partitions of the snuba-commit-log topic down to 1?

Yup! snuba-commit-log should have 1 partition, and you should only run one scheduler. Executors can scale as needed.

@bkk-bcd
Copy link

bkk-bcd commented May 2, 2024

I'm seeing the same thing after running:

/opt/bitnami/kafka/bin/kafka-topics.sh --bootstrap-server ${KAFKA_SERVICE} --topic snuba-metrics --alter --partitions 4

Can I get some clarification on how to address this in the kubernetes deployments? I'm not sure I follow the above conversation. Maybe this is a separate issue? My snuba-commit-log topic only has one partition:

./kafka-topics.sh --describe --topic snuba-commit-log
Defaulted container "kafka" out of: kafka, kafka-init (init)
Topic: snuba-commit-log TopicId: _ptvpzBGR0e0szsoWzWUdQ PartitionCount: 1       ReplicationFactor: 1    Configs: cleanup.policy=compact,delete,min.compaction.lag.ms=3600000,retention.bytes=21474836480
        Topic: snuba-commit-log Partition: 0    Leader: 1       Replicas: 1     Isr: 1

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 May 2, 2024
@untitaker
Copy link
Member

@bkk-bcd I believe we should continue this conversation at #5855

the original issue by OP seems fixed/answered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Archived in project
Development

No branches or pull requests

4 participants