-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send EventDataBatch with per-message partition_key #23510
Comments
hey @weltonrodrigo , thanks for your feedback! The I need to first check whether setting different partition keys of events within a batch would get events sent to the right partition. |
Confirmed just now it's not working… Should I need to create an EventBatch for each message or is there a way to send a single message? |
thanks for confirmation. this is expected by me actually because for a batch event, it's the partition id/partition key on the batch event that dominates. Yes, so far you would need to create batch for each message (partition). But we're working on a new feature called buffered producer in which you don't have to manage batch by yourselves and does automatically batching and sending in the background. I think this coming feature should help resolve your issue! |
Hi @weltonrodrigo - Would you be able to test this to see if it resolves your issue? You can upgrade with |
For context, this is a service limitation. There is no way to send a single batch of messages with heterogeneous partition keys. The Event Hubs service limits each message received by the gateway to a single partition key. In the case of an event batch, its physical form is an AMQP message that contains other messages as its body; the Event Hubs gateway associates the partition key from the batch envelope with all events that it contains. If any child message in the batch does have a different partition key annotation, the gateway will either choose to ignore the message annotation or may decide to reject the entire batch. As Swathi mentions, the buffered producer can work around this because it manages batches implicitly. When you enqueue an event using a partition key, the buffered producer hashes that to a specific partition and puts it into a batch that will be sent to that partition. This approach allows the buffered producer to accept events enqueued with different routing methods and send them together in a single batch. |
Are the other types of message also buffered? I have this application where I use a call to This is a moderate volume (100msg/second) and with the new buffered producer this call started to take a long time. Is that a good approach to detect producer connection problems? Can your test harnesses reproduce if under load that call becomes too slow? |
I'm not sure if this is the correct place to provide feedback about this, but I advanced a little more on this problem. I'm using python 3.10.5 (older versions also show the problem) gunicorn with aiohttp ( It stops responding to requests and soon gets restarted by gunicorn. Some weird combination of client requests per-second, This is way my healthchecks were failling. The app starts, responds for 30 seconds or so, freezes and then stop responding to any requests (even the /healthcheck). Do you have any version of aiohttp and gunicorn known to work with this buffered producer? My environment has those versions:
UPDATE: it gets stuck at Line 152 in 6a1a5c5
|
Hi @weltonrodrigo, thank you for your feedback. Would you be able to provide some code that I can use to reproduce the error please? |
Trying to reproduce on a smaller code. You can try enqueuing 10 messages per seconds, with a |
This code triggers the problem. This code is run until is gets stuck. Then a CTRL-C will exit. import asyncio
import logging
import sys
from azure.eventhub import EventData
from azure.eventhub.aio import EventHubSharedKeyCredential, \
EventHubProducerClient
logging.basicConfig(format=logging.BASIC_FORMAT)
log = logging.getLogger("azure")
log.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
log.addHandler(handler)
async def on_buffer_send_success(*args):
log.info("flushed")
async def on_buffer_send_error(*args):
log.error("error flushing")
producer = EventHubProducerClient(
fully_qualified_namespace="dev-eventhub-spia.servicebus.windows.net",
eventhub_name="testetorres",
credential=EventHubSharedKeyCredential("leituraescrita", "A9rQFR3qthjTushFXm/246/t6hsKE1civ8lwvYnKYus="),
auth_timeout=3, retry_total=3, retry_mode='fixed',
retry_backoff_factor=0.01,
buffered_mode=True,
on_success=on_buffer_send_success,
on_error=on_buffer_send_error,
max_wait_time=10,
max_buffer_length=100,
logging_enable=True
)
payload = """
{
"idImagem": "02032asdfasdfasdfasdfa1090-F01.jpg",
"dataHoraTz": "2022-07-21T16:50:25-03:00",
"camera": {
"numero": "Lasdfasdfasdf325"
},
"empresa": "AasdfasdfsRIS",
"dataHora": "2022-07-21 16:50:25",
"key": "4asdfasdfasasdfasdfas4BD7Fasdfasdfasdfas",
"placa": "Rasdfasdfasdfasdf3",
"dataRecebimento": "2022-07-21T16:50:45.499461-03:00",
"codigoLog": "2022072asdfasdfasdfasdfasdfas20325"
}
"""
async def publish(timeout):
while True:
await producer.send_event(EventData(payload))
log.info("Enqueued message")
await asyncio.sleep(timeout)
if __name__ == '__main__':
try:
asyncio.get_event_loop().run_until_complete(publish(.1))
except Exception as err:
log.exception(err) The output: foo@bar:~$ /usr/local/opt/[email protected]/bin/python3 -m venv 39env
foo@bar:~$ source 39env/bin/activate
foo@bar:~$ python test.py
Enqueued message
INFO:azure:Enqueued message
Partition '0' worker is checking max_wait_time.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '0' worker is checking max_wait_time.
Enqueued message
INFO:azure:Enqueued message
Partition '1' worker is checking max_wait_time.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '1' worker is checking max_wait_time.
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Partition '0' worker is checking max_wait_time.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '0' worker is checking max_wait_time.
Enqueued message
INFO:azure:Enqueued message
Partition '1' worker is checking max_wait_time.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '1' worker is checking max_wait_time.
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Partition '0' worker is checking max_wait_time.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '0' worker is checking max_wait_time.
Partition: '0' started flushing.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition: '0' started flushing.
Partition '0' is sending.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '0' is sending.
Enqueued message
INFO:azure:Enqueued message
Partition '1' worker is checking max_wait_time.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '1' worker is checking max_wait_time.
Partition: '1' started flushing.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition: '1' started flushing.
Partition '1' is sending.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '1' is sending.
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Enqueued message
INFO:azure:Enqueued message
Partition '0' sending 49 events succeeded.
INFO:azure.eventhub.aio._buffered_producer._buffered_producer_async:Partition '0' sending 49 events succeeded.
flushed
INFO:azure:flushed
^CTraceback (most recent call last):
File "test.py", line 62, in <module>
asyncio.get_event_loop().run_until_complete(publish(.1))
File "/usr/local/Cellar/[email protected]/3.8.13_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/usr/local/Cellar/[email protected]/3.8.13_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/local/Cellar/[email protected]/3.8.13_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/local/Cellar/[email protected]/3.8.13_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/Users/torres/firmacode/spia/py-apps/wsocrspia/38env/lib/python3.8/site-packages/azure/eventhub/aio/_buffered_producer/_buffered_producer_async.py", line 206, in check_max_wait_time_worker
await self._flush(raise_error=False)
File "/Users/torres/firmacode/spia/py-apps/wsocrspia/38env/lib/python3.8/site-packages/azure/eventhub/aio/_buffered_producer/_buffered_producer_async.py", line 152, in _flush
batch = self._buffered_queue.get()
File "/usr/local/Cellar/[email protected]/3.8.13_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/queue.py", line 170, in get
self.not_empty.wait()
File "/usr/local/Cellar/[email protected]/3.8.13_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 302, in wait
waiter.acquire()
KeyboardInterrupt
Requirements are: aiohttp==3.8.1
aiohttp-prometheus-exporter==0.2.4
aiosignal==1.2.0
aiotools==1.5.9
async-timeout==4.0.2
attrs==21.4.0
azure-core==1.24.2
azure-eventhub==5.10.0
certifi==2022.6.15
charset-normalizer==2.1.0
frozenlist==1.3.0
gunicorn==20.1.0
idna==3.3
multidict==6.0.2
orjson==3.6.7
prometheus-client==0.14.1
pytz==2022.1
requests==2.28.1
six==1.16.0
typing_extensions==4.3.0
uamqp==1.5.3
urllib3==1.26.11
yarl==1.7.2 Python is foo@bar:~$ python -VV
Python 3.9.13 (main, May 24 2022, 21:28:44)
[Clang 13.0.0 (clang-1300.0.29.30)] Python was installed with Note that those credentials and that eventhub don't exist anymore (deleted after I posted). |
Hi @weltonrodrigo - Thanks for your patience on this! More context on the bug from @kashifkhan: In our async producer, internally, we are using a sync queue to hold events. There is also an async queue implementation, but it's not thread-safe so we opted for the sync queue. This queue turned out to be blocking, which is not async-friendly, and we had to make a few changes to handle this behavior correctly. |
We've been using it since it was merged and seems fine. thanks. |
It would be nice to be able to send a batch where every message has it's own partition_key.
The case here is when I get a batch from an external client, but I need to repartition this batch. Send every message isolated it not ideal, so using a batch is essential.
But there is no public interface to set the partition key for a single EventData, only EventDataBatch. It used to exist, but was removed some time ago, judging for avaiable examples.
Why was it removed? Can we get it back?
Currently, I'm having to do:
Where
_custom_partitioner
it's just a business-aware partitioner.The text was updated successfully, but these errors were encountered: