-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRIVERS-716 Improved Bulk Write API #1534
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments on spec files. I'm not reviewing the test files.
### 11. Multi-batch bulkWrites | ||
|
||
This test MUST only run against standalones on server versions 8.0 and higher. The `bulkWrite` call takes an | ||
exceedingly long time on replicasets and sharded clusters. Drivers MAY adjust the timeouts used in this test to allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would an insert batch of 50 1MB documents take an "exceedingly long time"? Also, if a fail point is being used on each command, couldn't you also get by using fewer, larger documents that split into 2+ commands and trigger the timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated to insert a few large documents. I lifted most of this test from the existing insertMany
test for consistency, but the actual writes happening shouldn't matter much as long as the same blockConnection
is being used. Some basic local benchmarking of inserting a few large documents didn't show any time differences based on topology so I also removed that language.
Caveat: Kevin and I can't verify that this works as Rust and C both don't implement CSOT, so we'll need to wait for a driver that does have CSOT to implement this.
source/crud/bulk-write.md
Outdated
successful operations and errors will be returned. This field is optional and defaults to false on | ||
the server. | ||
|
||
`errorsOnly` corresponds to the `verboseResults` option defined on `BulkWriteOptions`. If the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`errorsOnly` corresponds to the `verboseResults` option defined on `BulkWriteOptions`. If the user | |
`errorsOnly` corresponds inversely to the `verboseResults` option defined on `BulkWriteOptions`. If the user |
Same language used earlier in BulkWriteOptions definition.
source/crud/bulk-write.md
Outdated
while writeModels.hasNext() { | ||
ops = DocumentSequence {} | ||
nsInfo = DocumentSequence {} | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inferred that Rust is being used here, but in the interest of readability across teams can we consider wrapping the while
and if
conditions in parentheses and replace loop
with something like while (true)
?
AFAIK, most of the other specs used something resembling Javascript for their code examples. I'm less concerned about this specific example here and more about setting a precedent for other authors using their own languages in specs.
If no one else cares, you can disregard this comment.
source/crud/bulk-write.md
Outdated
Drivers MUST attempt to consume the contents of the cursor returned in the server's `bulkWrite` | ||
response before returning to the user. This is required regardless of whether the user requested | ||
verbose or summary results, as the results cursor always contains any write errors that occurred. | ||
If the cursor contains a nonzero cursor ID, drivers MUST perform `getMore`s until the cursor has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively: "MUST execute getMore
until" if the "getMore
s" formatting appears strange.
source/crud/bulk-write.md
Outdated
|
||
Unlike the other result types, `InsertOneResult` contains an `insertedId` field that is generated | ||
driver-side, either by recording the `_id` field present in the user's insert document or creating | ||
and adding one. Drivers MUST only record these `insertedId`s in a `BulkWriteResult` when a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and adding one. Drivers MUST only record these `insertedId`s in a `BulkWriteResult` when a | |
and adding one. Drivers MUST only record these `insertedId` fields in a `BulkWriteResult` when a |
Related to my earlier suggestion about "getMore
s`. Feel free to disregard.
source/crud/bulk-write.md
Outdated
|
||
The Command Batching [Total Message Size](#total-message-size) section uses a 1000 byte overhead | ||
allowance to approximate the number of non-`bulkWrite`-specific bytes contained in an `OP_MSG` sent | ||
for a `bulkWrite` batch. This number was determined by constructing `OP_MSG`s with various fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a `bulkWrite` batch. This number was determined by constructing `OP_MSG`s with various fields | |
for a `bulkWrite` batch. This number was determined by constructing `OP_MSG` messages with various fields |
Feel free to disregard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary
Specifies the driver API for the new
bulkWrite
command. The API in the new specification was previously discussed and approved in WRITING-13533. I made a few changes to the syntax which are called out in inline comments.An implementation in the Rust Driver of this spec is currently in review (mongodb/mongo-rust-driver#1034). A C driver implementation is also in progress.
Test Plan
I've added unified tests in the
crud
directory for bulk write options, errors, and results. There are also some prose tests for bulk write batching. This seemed preferable to trying to add functionality to the unified test runner to build very large documents based onhello
response values.Basic tests are also added for the following specs:
Please complete the following before merging:
clusters, and serverless).