-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributed Dask queue : producer consumer #5843
Comments
Welcome @AmineDiro ! Thanks for your great question. We are typically using https://dask.discourse.group/ for questions and discussions around usage, etc. It's more likely to get community responses over there :) We try to use the github tracker mostly for bug reports, feature requests, etc. |
Hello, @fjetter Thanks for your response ! I'll cleanup this issue by submitting clear feature request for |
@crusaderky Based on your answer on #5671 (thanks for that!), I believe this feature request may be inaccurate and can be closed? Jim also mentioned that Queues were designed to store intermediate futures, small metadata information, etc., and not actual data which might overload the scheduler. |
There is some consensus for the design in #5671, although not a final one. |
The op does raise a few valid points. Namely, it is correct that descoping a Queue on the client side, as opposed to explicitly calling close(), will create a memory leak on the cluster, and this is definitely a bug as it should be treated like Future objects. |
Hello, 😄
I'm a data scientist/data engineer working on a specific workflow where I need to process a huge amount of documents so i tried using
distributed.Queue
to right a producer/consumer pattern using Dask distributed.I might miss something about how to correctly implement the producer/consumer using distributed but here are some important features that could be added to
distributed.Queue
. I would also like to help writing this class, so thanks for your guidance :del q
doesnt actually free up the distributed memoryq.join()
in distributed manner to block until all the queue items have been processedbatch_size
if the queue is emptyThanks a lot and I hope these features make sens. I hope, I can be of some help in advancing this class !
The text was updated successfully, but these errors were encountered: