Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain performance implications of the preferRest option #1862

Closed
brettwillis opened this issue Jul 12, 2023 · 4 comments
Closed

Explain performance implications of the preferRest option #1862

brettwillis opened this issue Jul 12, 2023 · 4 comments
Assignees
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@brettwillis
Copy link
Contributor

I've done a little experimentation with the preferRest option. It's pretty clear that the intention of this option was to get away without loading up the big grpc dependencies and therefore reduce boot times in apps where that is critical.

Other than that, I was expecting the RPC implementation (being the default) to otherwise be better runtime performance and request latencies by making use of HTTP/2 streaming etc etc...

However I did some experimentation on read/write performance of large data sets using the two implementations and here is what I found:

  • Reading in a ~3k collection using the collectionRef.stream() API is an order of magnitude faster using REST (loads ~3k docs in under 2s) than RPC (loads ~3k docs in around 20s)
  • Writing a similar collection (using the BulkWriter API) also seems to be much faster using REST

So, what gives? If we don't need snapshot listeners (the only API that requires RPC), would there be any reason not to use REST? If not, should preferRest: true be the default option?

Are there any other implications of using the REST API? (think: memory usage of collectionRef.stream(), transaction/request latencies)?

It would be great to have some discussion about the performance implications of the preferRest option (other than cold start latency), and to have that documented so we can make right decisions.

Thanks!

Environment

@brettwillis brettwillis added priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue. labels Jul 12, 2023
@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/nodejs-firestore API. label Jul 12, 2023
@ehsannas
Copy link
Contributor

Thanks for the question, @brettwillis . You are indeed correct that the main motivation for this change was to avoid loading up big dependencies (notably, grpc), and improve boot times.

In terms of performance, it tends to be workload-specific. This could be both the usage pattern (read vs write vs batch vs transaction vs what mix/order of these) as well as document content (perhaps grpc's compression performs differently on strings, numbers, blobs, etc). We have also heard from users who reported worse performance when using REST. If you don't need snapshot listeners, there should be nothing stopping you from using preferRest: true at all times.

@ehsannas ehsannas self-assigned this Jul 12, 2023
@brettwillis
Copy link
Contributor Author

In terms of performance, it tends to be workload-specific.

We have also heard from users who reported worse performance when using REST.

Thanks for your reply. Hmm ok, I guess it's up to us to benchmark our workloads then. By the way, do you have any references or data for those reports of worse performance (or any other data for that matter)? The 10x difference I observed is not insignificant.

Also another related question:

Say we benchmark our app and there is a clear winner for our workload, and say it's preferRest: true. Also say that we do use snapshot listeners (because we do).

Then once the app uses a snapshot listener, all other operations from that point onwards will then be "upgraded" to grpc, forever downgrading the performance of the app.

Our only other option here is to use two Firestore instances: one for snapshot listeners ("upgraded" to grpc) and the other for every other operation (using REST)?

Would you consider a configuration option that restricts grpc to the required APIs only, without "upgrading" every other API to grpc?

P.S. this "upgrade" behaviour was another thing that seemed to imply that grpc was the better and recommended option.

@ehsannas
Copy link
Contributor

@brettwillis , I'm also surprised by the numbers you have reported (10x) tbh. I can't prioritize it right now, but I'd be interested in running another round of experiments to measure.

Regarding the word "upgrade": perhaps we could have used a better word.. We did not mean to indicate that grpc is "better" or "more performant" than REST. We considered it an "upgrade" because it can support more functionalities.

Our only other option here is to use two Firestore instances

I think that may be it.

Would you consider a configuration option that restricts grpc to the required APIs only, without "upgrading" every other API to grpc?

I'll take this to the team, but cannot comment on when it could be implemented. The first step would be run some experiments to figure out if there's real value in doing this.

@ehsannas
Copy link
Contributor

Closing this and tracking internally via b/291107874.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

2 participants