-
-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query.eachBatch()
to Parse.Query API
#1114
Conversation
When a query has a large enough result set that we don't want to materialize it at once, the SDK provides `query.each()`, which yields each matching object to a processor, one at a time. This is a handy tool, but if the number of records to process is really so large, we also can take advantage of batching the processing. Compare the following operations: ``` // Processing N items involves N calls to Parse Server new Parse.Query('Item').each((item) => { item.set('foo', 'bar'); return item.save(); }) // Processing N items involves ceil(N / batchSize) calls to Parse Server const batchSize = 200; new Parse.Query('Item').eachBatch((items) => { items.forEach(item => item.set('foo', 'bar')); return Parse.Object.saveAll(items, { batchSize }); }, { batchSize }); ``` The `.each()` method is already written to do fetch the objects in batches; we effectively are splitting it out into two: - `.eachBatch()` does the work to fetch objects in batches and yield each batch - `.each()` calls `.eachBatch()` and handles invoking the callback for every item in the batch Aside: I considered adding the undocumented `batchSize` attribute already accepted by `.each()`, `.filter()`, `.map()` and `.reduce()` to the public API, but I suspect that at the time that you are performance sensitive enough to tune that parameter you are better served by switching to `eachBatch()`; the current implementation of `.each()` is to construct a promise chain with a node for every value in the batch, and my experience with very long promise chains has been a bit frustrating.
Codecov Report
@@ Coverage Diff @@
## master #1114 +/- ##
==========================================
+ Coverage 92.25% 92.28% +0.02%
==========================================
Files 54 54
Lines 5232 5235 +3
Branches 1172 1172
==========================================
+ Hits 4827 4831 +4
+ Misses 405 404 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. I will use this often. I have a pattern that I use in my code that mirrors this behavior. Your solution is much nicer.
Let's leave this open to give @davimacedo & @dplewis a chance to weigh in.
Thank you Noah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I like the idea!
There is a global batch size that you can set. https://github.com/parse-community/Parse-SDK-JS/pull/1053/files Can you add it? I agree with you about the promise chain. |
I personally expected the default to have each batch have the same number of results as a query that doesn't have a limit applied, which I think is based on a default in parse-server that is set to 100? I guess it feels like there's two somewhat-related values: the default page size when fetching, and the default batch size when we're submitting multiple mutations. One option would be for this method to change its name to something like I would cast a small vote for leaving this as-is, but I'm happy to make the change if you think that's best. Let me know @dplewis ! If we do move ahead on updating this to use the global batch size configuration, do you think that we should let Thanks for your feedback folks! 😄 |
Every SDK has a default batch size for example saveAll has a default 20 here but 40 in PHP. This was left over from parse.com days. We can keep your changes as is I forget the reason why I didn’t add it .each when I implemented global batch size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
When a query has a large enough result set that we don't want to
materialize it at once, the SDK provides
query.each()
, which yieldseach matching object to a processor, one at a time. This is a handy
tool, but if the number of records to process is really so large, we
also can take advantage of batching the processing. Compare the
following operations:
The
.each()
method is already written to do fetch the objects inbatches; we effectively are splitting it out into two:
.eachBatch()
does the work to fetch objects in batches and yieldeach batch
.each()
calls.eachBatch()
and handles invoking the callback forevery item in the batch
Aside: I considered adding the undocumented
batchSize
attributealready accepted by
.each()
,.filter()
,.map()
and.reduce()
tothe public API, but I suspect that at the time that you are performance
sensitive enough to tune that parameter you are better served by
switching to
eachBatch()
; the current implementation of.each()
isto construct a promise chain with a node for every value in the batch,
and my experience with very long promise chains has been a bit
frustrating.