Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSERT has bad Performance #225

Open
bugerry87 opened this issue Aug 2, 2024 · 1 comment
Open

UPSERT has bad Performance #225

bugerry87 opened this issue Aug 2, 2024 · 1 comment

Comments

@bugerry87
Copy link

UPSERT has bad Performance

UPSERT-Operation have bad performance due to unrequited workload caused by this line:

foundDocs = await this.db.find(this.collectionName, { id: doc.id }, {

What is Happening

To upsert like 1000 documents in ab batch i.e. Products from restorecommerce/data takes a significant amount of time.
This is because the ResourceAPI is checking the entire database for each document's counterpart. Of course, Databases are using efficient search algorithms, but not much better than O(log2(n)). Since ArangoDB does not and cannot know that id is unique but treat it like a common field, it will scan the entire collection each time and always end up with the worst case of O(n*log2(m)). Where n is the number of inputs and m the number of documents in DB.
Since UPSERT is checking for counterparts and calls UPDATE which is doing the same again, we double up the workload to O(2n*log(m)).
With n=1000 and m=1000 we end up with a complexity of 2*1000*log(1000)=~20000. So x20 per document.

How to Solve it

Set a limit: 2 in both queries, such that DB can terminate the search early and does not run into worst case every time.
We use limit: 2 such that we can ensue that there is no further document with the same ID which is considered to be an invalid state and should throw an error.

Further Improvement

Try to batch the query but do not process each document individually in a for-loop.
An example how to batch a query of many IDs see here (also using limit):

https://github.com/restorecommerce/ordering-srv/blob/217211bf1d8d21fe0198e44a5a71572da5538017/src/service.ts#L900

@bugerry87
Copy link
Author

bugerry87 commented Aug 2, 2024

PS: another way is to query for doc._key which is hashed and known as unique and therewith at best performance that you can get from ArangoDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant