Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST-1064 Retry on handshake failure #598

Merged
merged 6 commits into from
Mar 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions src/client/auth/sasl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use crate::{
client::{auth::AuthMechanism, options::ServerApi},
cmap::Command,
error::{Error, Result},
operation::{CommandErrorBody, CommandResponse},
};

/// Encapsulates the command building of a `saslStart` command.
Expand Down Expand Up @@ -98,12 +99,20 @@ fn validate_command_success(auth_mechanism: &str, response: &Document) -> Result

match bson_util::get_int(ok) {
Some(1) => Ok(()),
Some(_) => Err(Error::authentication_error(
auth_mechanism,
response
.get_str("errmsg")
.unwrap_or("Authentication failure"),
)),
Some(_) => {
let source = bson::from_bson::<CommandResponse<CommandErrorBody>>(Bson::Document(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing this, I was a little surprised to learn that the SASL commands use bespoke parsing and validation rather than going through Serde integration like the rest; if there's no strong reason for that, it might be a good candidate for future cleanup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a lot of the auth code is some of the oldest in the driver and actually predates the operation layer and much of the serde usage that exists today. I think it would definitely benefit from being cleaned up and updated. Could you file a ticket for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed RUST-1238.

response.clone(),
))
.map(|cmd_resp| cmd_resp.body.into())
.ok();
Err(Error::authentication_error(
auth_mechanism,
response
.get_str("errmsg")
.unwrap_or("Authentication failure"),
)
.with_source(source))
}
_ => Err(Error::invalid_authentication_response(auth_mechanism)),
}
}
Expand Down
68 changes: 45 additions & 23 deletions src/client/executor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -332,16 +332,24 @@ impl Client {
Ok(c) => c,
Err(mut err) => {
err.add_labels_and_update_pin(None, &mut session, None)?;
if err.is_read_retryable() && self.inner.options.retry_writes != Some(false) {
err.add_label(RETRYABLE_WRITE_ERROR);
}

if err.is_pool_cleared() {
let op_retry = match self.get_op_retryability(&op, &session) {
Retryability::Read => err.is_read_retryable(),
Retryability::Write => err.is_write_retryable(),
_ => false,
};
if err.is_pool_cleared() || op_retry {
return self.execute_retry(&mut op, &mut session, None, err).await;
} else {
return Err(err);
}
}
};

let retryability = self.get_retryability(&conn, &op, &session).await?;
let retryability = self.get_retryability(&conn, &op, &session)?;

let txn_number = match session {
Some(ref mut session) => {
Expand Down Expand Up @@ -433,7 +441,7 @@ impl Client {
Err(_) => return Err(first_error),
};

let retryability = self.get_retryability(&conn, op, session).await?;
let retryability = self.get_retryability(&conn, op, session)?;
if retryability == Retryability::None {
return Err(first_error);
}
Expand Down Expand Up @@ -825,35 +833,49 @@ impl Client {
}

/// Returns the retryability level for the execution of this operation.
async fn get_retryability<T: Operation>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out this didn't need to be async.

fn get_op_retryability<T: Operation>(
&self,
conn: &Connection,
op: &T,
session: &Option<&mut ClientSession>,
) -> Result<Retryability> {
if !session
) -> Retryability {
if session
.as_ref()
.map(|session| session.in_transaction())
.unwrap_or(false)
{
match op.retryability() {
Retryability::Read if self.inner.options.retry_reads != Some(false) => {
return Ok(Retryability::Read);
}
Retryability::Write if conn.stream_description()?.supports_retryable_writes() => {
// commitTransaction and abortTransaction should be retried regardless of the
// value for retry_writes set on the Client
if op.name() == CommitTransaction::NAME
|| op.name() == AbortTransaction::NAME
|| self.inner.options.retry_writes != Some(false)
{
return Ok(Retryability::Write);
}
}
_ => {}
return Retryability::None;
}
match op.retryability() {
Retryability::Read if self.inner.options.retry_reads != Some(false) => {
Retryability::Read
}
// commitTransaction and abortTransaction should be retried regardless of the
// value for retry_writes set on the Client
Retryability::Write
if op.name() == CommitTransaction::NAME
|| op.name() == AbortTransaction::NAME
|| self.inner.options.retry_writes != Some(false) =>
{
Retryability::Write
}
_ => Retryability::None,
}
}

/// Returns the retryability level for the execution of this operation on this connection.
fn get_retryability<T: Operation>(
&self,
conn: &Connection,
op: &T,
session: &Option<&mut ClientSession>,
) -> Result<Retryability> {
match self.get_op_retryability(op, session) {
Retryability::Read => Ok(Retryability::Read),
Retryability::Write if conn.stream_description()?.supports_retryable_writes() => {
Ok(Retryability::Write)
}
_ => Ok(Retryability::None),
}
Ok(Retryability::None)
}

async fn update_cluster_time(
Expand Down
9 changes: 9 additions & 0 deletions src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ pub struct Error {
pub kind: Box<ErrorKind>,
labels: HashSet<String>,
pub(crate) wire_version: Option<i32>,
#[source]
pub(crate) source: Option<Box<Error>>,
}

impl Error {
Expand All @@ -61,6 +63,7 @@ impl Error {
kind: Box::new(kind),
labels,
wire_version: None,
source: None,
}
}

Expand Down Expand Up @@ -237,6 +240,7 @@ impl Error {
ErrorKind::Write(WriteFailure::WriteConcernError(wc_error)) => Some(wc_error.code),
_ => None,
}
.or_else(|| self.source.as_ref().and_then(|s| s.code()))
}

/// Gets the message for this error, if applicable, for use in testing.
Expand Down Expand Up @@ -345,6 +349,11 @@ impl Error {
}
false
}

pub(crate) fn with_source<E: Into<Option<Error>>>(mut self, source: E) -> Self {
self.source = source.into().map(Box::new);
self
}
}

impl<E> From<E> for Error
Expand Down
22 changes: 13 additions & 9 deletions src/test/spec/json/retryable-reads/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@ Retryable Reads Tests
Introduction
============

The YAML and JSON files in this directory tree are platform-independent tests
that drivers can use to prove their conformance to the Retryable Reads spec.
The YAML and JSON files in the ``legacy`` and ``unified`` sub-directories are platform-independent tests
that drivers can use to prove their conformance to the Retryable Reads spec. Tests in the
``unified`` directory are written using the `Unified Test Format <../../unified-test-format/unified-test-format.rst>`_.
Tests in the ``legacy`` directory are written using the format described below.

Prose tests, which are not easily expressed in YAML, are also presented
in this file. Those tests will need to be manually implemented by each driver.
Expand Down Expand Up @@ -93,20 +95,20 @@ Each YAML file has the following keys:

- ``database_name`` and ``collection_name``: Optional. The database and
collection to use for testing.

- ``bucket_name``: Optional. The GridFS bucket name to use for testing.

- ``data``: The data that should exist in the collection(s) under test before
each test run. This will typically be an array of documents to be inserted
into the collection under test (i.e. ``collection_name``); however, this field
may also be an object mapping collection names to arrays of documents to be
inserted into the specified collection.

- ``tests``: An array of tests that are to be run independently of each other.
Each test will have some or all of the following fields:

- ``description``: The name of the test.

- ``clientOptions``: Optional, parameters to pass to MongoClient().

- ``useMultipleMongoses`` (optional): If ``true``, and the topology type is
Expand All @@ -121,7 +123,7 @@ Each YAML file has the following keys:
server.

``useMultipleMongoses`` only affects ``Sharded`` and ``LoadBalanced`` topologies.

- ``skipReason``: Optional, string describing why this test should be skipped.

- ``failPoint``: Optional, a server fail point to enable, expressed as the
Expand All @@ -140,10 +142,10 @@ Each YAML file has the following keys:
- ``result``: Optional. The return value from the operation, if any. This
field may be a scalar (e.g. in the case of a count), a single document, or
an array of documents in the case of a multi-document read.

- ``error``: Optional. If ``true``, the test should expect an error or
exception.

- ``expectations``: Optional list of command-started events.

GridFS Tests
Expand All @@ -167,7 +169,7 @@ data.


.. _GridFSBucket spec: https://github.com/mongodb/specifications/blob/master/source/gridfs/gridfs-spec.rst#configurable-gridfsbucket-class


Speeding Up Tests
-----------------
Expand Down Expand Up @@ -229,6 +231,8 @@ This test requires MongoDB 4.2.9+ for ``blockConnection`` support in the failpoi
Changelog
=========

:2022-01-10: Create legacy and unified subdirectories for new unified tests

:2021-08-27: Clarify behavior of ``useMultipleMongoses`` for ``LoadBalanced`` topologies.

:2019-03-19: Add top-level ``runOn`` field to denote server version and/or
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@
]
},
{
"description": "EstimatedDocumentCount succeeds after NotMaster",
"description": "EstimatedDocumentCount succeeds after NotWritablePrimary",
"failPoint": {
"configureFailPoint": "failCommand",
"mode": {
Expand Down Expand Up @@ -228,7 +228,7 @@
]
},
{
"description": "EstimatedDocumentCount succeeds after NotMasterNoSlaveOk",
"description": "EstimatedDocumentCount succeeds after NotPrimaryNoSecondaryOk",
"failPoint": {
"configureFailPoint": "failCommand",
"mode": {
Expand Down Expand Up @@ -298,7 +298,7 @@
]
},
{
"description": "EstimatedDocumentCount succeeds after NotMasterOrSecondary",
"description": "EstimatedDocumentCount succeeds after NotPrimaryOrSecondary",
"failPoint": {
"configureFailPoint": "failCommand",
"mode": {
Expand Down Expand Up @@ -788,7 +788,7 @@
]
},
{
"description": "EstimatedDocumentCount fails after two NotMaster errors",
"description": "EstimatedDocumentCount fails after two NotWritablePrimary errors",
"failPoint": {
"configureFailPoint": "failCommand",
"mode": {
Expand Down Expand Up @@ -858,7 +858,7 @@
]
},
{
"description": "EstimatedDocumentCount fails after NotMaster when retryReads is false",
"description": "EstimatedDocumentCount fails after NotWritablePrimary when retryReads is false",
"clientOptions": {
"retryReads": false
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ tests:
- *retryable_command_started_event
- *retryable_command_started_event
-
description: "EstimatedDocumentCount succeeds after NotMaster"
description: "EstimatedDocumentCount succeeds after NotWritablePrimary"
failPoint:
<<: *failCommand_failPoint
data: { failCommands: [aggregate], errorCode: 10107 }
Expand All @@ -50,7 +50,7 @@ tests:
- *retryable_command_started_event
- *retryable_command_started_event
-
description: "EstimatedDocumentCount succeeds after NotMasterNoSlaveOk"
description: "EstimatedDocumentCount succeeds after NotPrimaryNoSecondaryOk"
failPoint:
<<: *failCommand_failPoint
data: { failCommands: [aggregate], errorCode: 13435 }
Expand All @@ -59,7 +59,7 @@ tests:
- *retryable_command_started_event
- *retryable_command_started_event
-
description: "EstimatedDocumentCount succeeds after NotMasterOrSecondary"
description: "EstimatedDocumentCount succeeds after NotPrimaryOrSecondary"
failPoint:
<<: *failCommand_failPoint
data: { failCommands: [aggregate], errorCode: 13436 }
Expand Down Expand Up @@ -122,7 +122,7 @@ tests:
- *retryable_command_started_event
- *retryable_command_started_event
-
description: "EstimatedDocumentCount fails after two NotMaster errors"
description: "EstimatedDocumentCount fails after two NotWritablePrimary errors"
failPoint:
<<: *failCommand_failPoint
mode: { times: 2 }
Expand All @@ -135,7 +135,7 @@ tests:
- *retryable_command_started_event
- *retryable_command_started_event
-
description: "EstimatedDocumentCount fails after NotMaster when retryReads is false"
description: "EstimatedDocumentCount fails after NotWritablePrimary when retryReads is false"
clientOptions:
retryReads: false
failPoint:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@
]
},
{
"description": "EstimatedDocumentCount succeeds after NotMaster",
"description": "EstimatedDocumentCount succeeds after NotWritablePrimary",
"failPoint": {
"configureFailPoint": "failCommand",
"mode": {
Expand Down Expand Up @@ -150,7 +150,7 @@
]
},
{
"description": "EstimatedDocumentCount succeeds after NotMasterNoSlaveOk",
"description": "EstimatedDocumentCount succeeds after NotPrimaryNoSecondaryOk",
"failPoint": {
"configureFailPoint": "failCommand",
"mode": {
Expand Down Expand Up @@ -190,7 +190,7 @@
]
},
{
"description": "EstimatedDocumentCount succeeds after NotMasterOrSecondary",
"description": "EstimatedDocumentCount succeeds after NotPrimaryOrSecondary",
"failPoint": {
"configureFailPoint": "failCommand",
"mode": {
Expand Down Expand Up @@ -470,7 +470,7 @@
]
},
{
"description": "EstimatedDocumentCount fails after two NotMaster errors",
"description": "EstimatedDocumentCount fails after two NotWritablePrimary errors",
"failPoint": {
"configureFailPoint": "failCommand",
"mode": {
Expand Down Expand Up @@ -510,7 +510,7 @@
]
},
{
"description": "EstimatedDocumentCount fails after NotMaster when retryReads is false",
"description": "EstimatedDocumentCount fails after NotWritablePrimary when retryReads is false",
"clientOptions": {
"retryReads": false
},
Expand Down
Loading