Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(push): add support for very large batch of documents #358

Merged
merged 17 commits into from
Jul 20, 2021
Merged

Conversation

olamothe
Copy link
Member

@olamothe olamothe commented Jul 19, 2021

Proposed changes

Rework upload command to split by chunks manageable by the Push API.
I was able to test and push ~500k documents in a source in a couple of seconds (have the documents accepted by the Push API).

The end result for the user is something like this:

image

☝️ Pushing a folder with 2 files (both of them have the same number of documents because it is fake data).

Users don't have to worry about splitting the size of their files or how many documents there are, it should all be handled automatically for them.

Testing

  • Manual Tests:

https://coveord.atlassian.net/browse/CDX-471

@olamothe olamothe changed the base branch from CDX-465 to master July 20, 2021 13:51
@@ -154,4 +172,61 @@ export default class SourcePushAdd extends Command {
);
throw err;
}

private splitByChunkAndUpload(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main function of this PR:

Create and return two function:

  • One to push documents (send) which accumulate documents until they reach a maxContentLength size. When the limit is reached, push the documents to Coveo (uploadBatch function).
  • One to "close" when there's no more files to parse and convert to DocumentBuilder, so that we can flush the remaining queue (remaining documents in accumulator.chunks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done

@@ -154,4 +172,61 @@ export default class SourcePushAdd extends Command {
);
throw err;
}

private splitByChunkAndUpload(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done

@olamothe olamothe merged commit 03927d7 into master Jul 20, 2021
@louis-bompart louis-bompart deleted the CDX-471 branch July 29, 2021 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants