-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(push): add support for very large batch of documents #358
Conversation
@@ -154,4 +172,61 @@ export default class SourcePushAdd extends Command { | |||
); | |||
throw err; | |||
} | |||
|
|||
private splitByChunkAndUpload( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main function of this PR:
Create and return two function:
- One to push documents (send) which accumulate documents until they reach a maxContentLength size. When the limit is reached, push the documents to Coveo (
uploadBatch
function). - One to "close" when there's no more files to parse and convert to
DocumentBuilder
, so that we can flush the remaining queue (remaining documents inaccumulator.chunks
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done
@@ -154,4 +172,61 @@ export default class SourcePushAdd extends Command { | |||
); | |||
throw err; | |||
} | |||
|
|||
private splitByChunkAndUpload( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done
Proposed changes
Rework upload command to split by chunks manageable by the Push API.
I was able to test and push ~500k documents in a source in a couple of seconds (have the documents accepted by the Push API).
The end result for the user is something like this:
☝️ Pushing a folder with 2 files (both of them have the same number of documents because it is fake data).
Users don't have to worry about splitting the size of their files or how many documents there are, it should all be handled automatically for them.
Testing
https://coveord.atlassian.net/browse/CDX-471