-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lease by Physical Partition Key Range #148
Comments
This is a fair point, and something we want to solve. We are not just there yet but we want to allow a higher degree of parallelism that is not based off physical partitions. There are ways to increase physical partitions (more data, higher RU/s) but our current library has the REST API limitation, which is, we can only call the REST API at the physical partition, so that limits the degree of parallelism. If what you want is a higher throughput processing the changes, what you can do is use the Change Feed as input, and fan out the operations into multiple consumer instances, and use manual checkpointing to confirm once all those consumers have finalized. |
We're effectively doing this, but we're still stuck behind the Change Feed as the bottleneck. Do you have baseline performance numbers for change feeds given allocated RUs? i.e. Assuming default batch size of 100 and no work performed in the onChanges method, how many message per second can a change feed processor process for 1K, 10K, 100K collections? |
The speed at which the changes are delivered to the Observer are basically the speed at which you can process them. The cycle is:
The rate is defined by your own implementation. |
wondering if this is a reference to |
None of the documentation for change feeds mention physical partitions, but those are the leased units. I don't believe any of my collections have more that 1 physical partition, so none of our feeds are load balanced because one host gets the only available lease.
With 2M documents in a collection and throughput at about 150 documents / second, it would take 4 hours for the single active host to process the backlog, and that's with a no-op onChangesDelegate.
Are my collections odd in that they have only one physical?
It would seem like a lot of people assume that change feeds will load balance their partitions given that none of the documentation on change feeds indicates otherwise, but in practice, only one instance processes the entire collection.
The text was updated successfully, but these errors were encountered: