Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query.eachBatch() to Parse.Query API #1114

Merged
merged 2 commits into from
Mar 12, 2020

Conversation

noahsilas
Copy link
Contributor

When a query has a large enough result set that we don't want to
materialize it at once, the SDK provides query.each(), which yields
each matching object to a processor, one at a time. This is a handy
tool, but if the number of records to process is really so large, we
also can take advantage of batching the processing. Compare the
following operations:

// Processing N items involves N calls to Parse Server
new Parse.Query('Item').each((item) => {
  item.set('foo', 'bar');
  return item.save();
})

// Processing N items involves ceil(N / batchSize) calls to Parse Server
const batchSize = 200;
new Parse.Query('Item').eachBatch((items) => {
  items.forEach(item => item.set('foo', 'bar'));
  return Parse.Object.saveAll(items, { batchSize });
}, { batchSize });

The .each() method is already written to do fetch the objects in
batches; we effectively are splitting it out into two:

  • .eachBatch() does the work to fetch objects in batches and yield
    each batch
  • .each() calls .eachBatch() and handles invoking the callback for
    every item in the batch

Aside: I considered adding the undocumented batchSize attribute
already accepted by .each(), .filter(), .map() and .reduce() to
the public API, but I suspect that at the time that you are performance
sensitive enough to tune that parameter you are better served by
switching to eachBatch(); the current implementation of .each() is
to construct a promise chain with a node for every value in the batch,
and my experience with very long promise chains has been a bit
frustrating.

When a query has a large enough result set that we don't want to
materialize it at once, the SDK provides `query.each()`, which yields
each matching object to a processor, one at a time. This is a handy
tool, but if the number of records to process is really so large, we
also can take advantage of batching the processing. Compare the
following operations:

```
// Processing N items involves N calls to Parse Server
new Parse.Query('Item').each((item) => {
  item.set('foo', 'bar');
  return item.save();
})

// Processing N items involves ceil(N / batchSize) calls to Parse Server
const batchSize = 200;
new Parse.Query('Item').eachBatch((items) => {
  items.forEach(item => item.set('foo', 'bar'));
  return Parse.Object.saveAll(items, { batchSize });
}, { batchSize });
```

The `.each()` method is already written to do fetch the objects in
batches; we effectively are splitting it out into two:
- `.eachBatch()` does the work to fetch objects in batches and yield
  each batch
- `.each()` calls `.eachBatch()` and handles invoking the callback for
  every item in the batch

Aside: I considered adding the undocumented `batchSize` attribute
already accepted by `.each()`, `.filter()`, `.map()` and `.reduce()` to
the public API, but I suspect that at the time that you are performance
sensitive enough to tune that parameter you are better served by
switching to `eachBatch()`; the current implementation of `.each()` is
to construct a promise chain with a node for every value in the batch,
and my experience with very long promise chains has been a bit
frustrating.
@codecov
Copy link

codecov bot commented Mar 11, 2020

Codecov Report

Merging #1114 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1114      +/-   ##
==========================================
+ Coverage   92.25%   92.28%   +0.02%     
==========================================
  Files          54       54              
  Lines        5232     5235       +3     
  Branches     1172     1172              
==========================================
+ Hits         4827     4831       +4     
+ Misses        405      404       -1
Impacted Files Coverage Δ
src/ParseQuery.js 96.2% <100%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc9d6af...ca43cb0. Read the comment docs.

Copy link
Contributor

@acinader acinader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. I will use this often. I have a pattern that I use in my code that mirrors this behavior. Your solution is much nicer.

Let's leave this open to give @davimacedo & @dplewis a chance to weigh in.

Thank you Noah.

Copy link
Member

@davimacedo davimacedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I like the idea!

@dplewis
Copy link
Member

dplewis commented Mar 12, 2020

There is a global batch size that you can set.

https://github.com/parse-community/Parse-SDK-JS/pull/1053/files Can you add it?

I agree with you about the promise chain.

@noahsilas
Copy link
Contributor Author

I personally expected the default to have each batch have the same number of results as a query that doesn't have a limit applied, which I think is based on a default in parse-server that is set to 100?

I guess it feels like there's two somewhat-related values: the default page size when fetching, and the default batch size when we're submitting multiple mutations. One option would be for this method to change its name to something like eachPage and have a pageSize argument, which would help make it clearer that you shouldn't expect this to respect a configured batch size, but it's also sort of nicely consistent for all these methods that are dealing with chunks of objects to use the same parameter name, so that there's never confusion about which parameter to use for a given method.

I would cast a small vote for leaving this as-is, but I'm happy to make the change if you think that's best. Let me know @dplewis !

If we do move ahead on updating this to use the global batch size configuration, do you think that we should let ParseQuery.each() inherit that default batch size, or should we add an override there to keep its default at 100 to preserve its current behavior?

Thanks for your feedback folks! 😄

@dplewis
Copy link
Member

dplewis commented Mar 12, 2020

Every SDK has a default batch size for example saveAll has a default 20 here but 40 in PHP. This was left over from parse.com days. We can keep your changes as is I forget the reason why I didn’t add it .each when I implemented global batch size.

Copy link
Member

@dplewis dplewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants