Add `query.eachBatch()` to Parse.Query API #1114

noahsilas · 2020-03-11T02:16:18Z

When a query has a large enough result set that we don't want to
materialize it at once, the SDK provides query.each(), which yields
each matching object to a processor, one at a time. This is a handy
tool, but if the number of records to process is really so large, we
also can take advantage of batching the processing. Compare the
following operations:

// Processing N items involves N calls to Parse Server
new Parse.Query('Item').each((item) => {
  item.set('foo', 'bar');
  return item.save();
})

// Processing N items involves ceil(N / batchSize) calls to Parse Server
const batchSize = 200;
new Parse.Query('Item').eachBatch((items) => {
  items.forEach(item => item.set('foo', 'bar'));
  return Parse.Object.saveAll(items, { batchSize });
}, { batchSize });

The .each() method is already written to do fetch the objects in
batches; we effectively are splitting it out into two:

.eachBatch() does the work to fetch objects in batches and yield
each batch
.each() calls .eachBatch() and handles invoking the callback for
every item in the batch

Aside: I considered adding the undocumented batchSize attribute
already accepted by .each(), .filter(), .map() and .reduce() to
the public API, but I suspect that at the time that you are performance
sensitive enough to tune that parameter you are better served by
switching to eachBatch(); the current implementation of .each() is
to construct a promise chain with a node for every value in the batch,
and my experience with very long promise chains has been a bit
frustrating.

When a query has a large enough result set that we don't want to materialize it at once, the SDK provides `query.each()`, which yields each matching object to a processor, one at a time. This is a handy tool, but if the number of records to process is really so large, we also can take advantage of batching the processing. Compare the following operations: ``` // Processing N items involves N calls to Parse Server new Parse.Query('Item').each((item) => { item.set('foo', 'bar'); return item.save(); }) // Processing N items involves ceil(N / batchSize) calls to Parse Server const batchSize = 200; new Parse.Query('Item').eachBatch((items) => { items.forEach(item => item.set('foo', 'bar')); return Parse.Object.saveAll(items, { batchSize }); }, { batchSize }); ``` The `.each()` method is already written to do fetch the objects in batches; we effectively are splitting it out into two: - `.eachBatch()` does the work to fetch objects in batches and yield each batch - `.each()` calls `.eachBatch()` and handles invoking the callback for every item in the batch Aside: I considered adding the undocumented `batchSize` attribute already accepted by `.each()`, `.filter()`, `.map()` and `.reduce()` to the public API, but I suspect that at the time that you are performance sensitive enough to tune that parameter you are better served by switching to `eachBatch()`; the current implementation of `.each()` is to construct a promise chain with a node for every value in the batch, and my experience with very long promise chains has been a bit frustrating.

codecov · 2020-03-11T02:24:01Z

Codecov Report

Merging #1114 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1114      +/-   ##
==========================================
+ Coverage   92.25%   92.28%   +0.02%     
==========================================
  Files          54       54              
  Lines        5232     5235       +3     
  Branches     1172     1172              
==========================================
+ Hits         4827     4831       +4     
+ Misses        405      404       -1

Impacted Files	Coverage Δ
src/ParseQuery.js	`96.2% <100%> (+0.16%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc9d6af...ca43cb0. Read the comment docs.

acinader

Very nice. I will use this often. I have a pattern that I use in my code that mirrors this behavior. Your solution is much nicer.

Let's leave this open to give @davimacedo & @dplewis a chance to weigh in.

Thank you Noah.

davimacedo

Nice. I like the idea!

dplewis · 2020-03-12T00:50:37Z

There is a global batch size that you can set.

https://github.com/parse-community/Parse-SDK-JS/pull/1053/files Can you add it?

I agree with you about the promise chain.

noahsilas · 2020-03-12T02:42:18Z

I personally expected the default to have each batch have the same number of results as a query that doesn't have a limit applied, which I think is based on a default in parse-server that is set to 100?

I guess it feels like there's two somewhat-related values: the default page size when fetching, and the default batch size when we're submitting multiple mutations. One option would be for this method to change its name to something like eachPage and have a pageSize argument, which would help make it clearer that you shouldn't expect this to respect a configured batch size, but it's also sort of nicely consistent for all these methods that are dealing with chunks of objects to use the same parameter name, so that there's never confusion about which parameter to use for a given method.

I would cast a small vote for leaving this as-is, but I'm happy to make the change if you think that's best. Let me know @dplewis !

If we do move ahead on updating this to use the global batch size configuration, do you think that we should let ParseQuery.each() inherit that default batch size, or should we add an override there to keep its default at 100 to preserve its current behavior?

Thanks for your feedback folks! 😄

dplewis · 2020-03-12T02:47:44Z

Every SDK has a default batch size for example saveAll has a default 20 here but 40 in PHP. This was left over from parse.com days. We can keep your changes as is I forget the reason why I didn’t add it .each when I implemented global batch size.

dplewis

LGTM!

acinader approved these changes Mar 11, 2020

View reviewed changes

davimacedo approved these changes Mar 11, 2020

View reviewed changes

Merge branch 'master' into query-eachBatch

ca43cb0

dplewis approved these changes Mar 12, 2020

View reviewed changes

acinader merged commit a3e9983 into parse-community:master Mar 12, 2020

This was referenced Mar 30, 2020

Update to 2.12.0 rabidaudio/bower-parse#52

Closed

Update to 2.12.0 rabidaudio/bower-parse#53

Closed

Update to 2.12.0 rabidaudio/bower-parse#54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `query.eachBatch()` to Parse.Query API #1114

Add `query.eachBatch()` to Parse.Query API #1114

noahsilas commented Mar 11, 2020

codecov bot commented Mar 11, 2020 •

edited

Loading

acinader left a comment

davimacedo left a comment

dplewis commented Mar 12, 2020

noahsilas commented Mar 12, 2020

dplewis commented Mar 12, 2020

dplewis left a comment

Add query.eachBatch() to Parse.Query API #1114

Add query.eachBatch() to Parse.Query API #1114

Conversation

noahsilas commented Mar 11, 2020

codecov bot commented Mar 11, 2020 • edited Loading

Codecov Report

acinader left a comment

Choose a reason for hiding this comment

davimacedo left a comment

Choose a reason for hiding this comment

dplewis commented Mar 12, 2020

noahsilas commented Mar 12, 2020

dplewis commented Mar 12, 2020

dplewis left a comment

Choose a reason for hiding this comment

Add `query.eachBatch()` to Parse.Query API #1114

Add `query.eachBatch()` to Parse.Query API #1114

codecov bot commented Mar 11, 2020 •

edited

Loading