-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRIVERS-2972: add message requirement to ServerDescription.error #1729
Conversation
the `error` field of the new `ServerDescription` object MUST include a descriptive error explaining that it was | ||
invalidated because the primary was determined to be stale. Drivers MAY additionally specify whether this was due to an | ||
electionId or setVersion mismatch. A multi-threaded client MUST | ||
[request an immediate check](server-monitoring.md#requesting-an-immediate-check) for that server as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expand the scope of this ticket to include more info in other cases where we reset a server to "unknown" besides stale primary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think that makes sense to do here. Seems like the only other place we do that is in the handleError
function, so I'll take a look at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are two places we do this in our handleError
logic, but in both cases, the handleError
function takes in an error as a parameter. Do we still want to be testing against the errors here?
should be none or one) and replaces its description with a default ServerDescription of type "Unknown". Additionally, | ||
the `error` field of the new `ServerDescription` object MUST include a descriptive error explaining that it was | ||
invalidated because the primary was determined to be stale. Drivers MAY additionally specify whether this was due to an | ||
electionId or setVersion mismatch. A multi-threaded client MUST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get more specific about what the error should look like so that we can add tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I purposely left this a little more open-ended to give drivers leeway to use the Error/Exception API most native to their language. I can say that more explicitly in the ServerDescription.error
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to cross-reference the ServerDescription.error
section
- (=) `error`: information about the last error related to this server. Default null. | ||
- (=) `error`: information about the last error related to this server. Default null. MUST contain or be able to produce | ||
a string describing the error. The name of the field containing the string describing the error SHOULD be what is | ||
most idiomatic for each driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the motivation for adding "The name of the field containing the string describing the error SHOULD be what is most idiomatic for each driver."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that the error
field would be an object representing an error and since different languages have different ways of doing that, I think it's better to defer to driver engineers on what is the best way to communicate that to users. In Node we'd likely do that with a new MongoError
subclass (which itself subclasses the native Node Error
type) that uses a message
field to hold human readable information about the error. We currently have it typed as a generic MongoError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think we can remove that sentence since it generally always true of any spec prescribed api.
| RSPrimary with a stale electionId is discovered | `'primary marked stale due to electionId mismatch'` | | ||
| RSPrimary with a stale setVersion is discovered | `'primary marked stale due to setVersion mismatch'` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we combine these two cases and include both the stale and max election tuples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair
| -------------------------------------------------------------------- | -------------------------------------------------------- | | ||
| RSPrimary with a stale electionId is discovered | `'primary marked stale due to electionId mismatch'` | | ||
| RSPrimary with a stale setVersion is discovered | `'primary marked stale due to setVersion mismatch'` | | ||
| A more current RSPrimary is discovered alongside an existing primary | `'primary marked stale due to discovery of new primary'` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we combine this case as well? IIUC the case is the same as the above but just from the other server's point of view.
Node POC: mongodb/node-mongodb-native#4340 |
source/server-discovery-and-monitoring/server-discovery-and-monitoring.md
Outdated
Show resolved
Hide resolved
else: | ||
# Stale primary. | ||
# replace serverDescription with a default ServerDescription of type "Unknown" | ||
if serverDescription.electionId < topologyDescription.maxElectionId: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we the pseudocode is a bit more verbose now that the logic needs to distinguish between the electionId vs setVersion stale cases. What do you think about one error message that include the two tuples:
"primary marked stale due to electionId/setVersion mismatch, <stale tuple> is stale compared to <max tuple>"
That way we don't need to refactor this code and more importantly drivers don't need to refactor their comparison logic.
source/server-discovery-and-monitoring/server-discovery-and-monitoring.md
Outdated
Show resolved
Hide resolved
@@ -906,7 +915,8 @@ for each server in topologyDescription.servers: | |||
if server.address != serverDescription.address: | |||
if server.type is RSPrimary: | |||
# See note below about invalidating an old primary. | |||
replace the server with a default ServerDescription of type "Unknown" | |||
# replace the server with a default ServerDescription of type "Unknown" | |||
# and an error field with a message containing the subsring "primary marked stale due to discovery of newer primary" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bit off to have an if-statement code block with only comments. How about:
if server.type is RSPrimary:
# See note below about invalidating an old primary.
# The error field MUST include the substring "primary marked stale due to discovery of newer primary"
replace the server with a default ServerDescription of type "Unknown"
@@ -71,6 +71,8 @@ following keys: | |||
|
|||
- type: A ServerType name, like "RSSecondary". See [ServerType](../server-discovery-and-monitoring.md#servertype) for | |||
details pertaining to async and multi-threaded drivers. | |||
- error: An optional object with a with a string field containing a string that must be a substring of the message on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An optional object with a with a string field containing a string that must be...
->
An optional string that must be...
|
||
- 2024-11-04: Make the description of `TopologyDescription.servers` consistent with the spec tests. | ||
|
||
- 2025-01-22: Add error messages when a new primary is elected or a primary with a stale electionID or setVersion is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
electionID -> electionId
client MUST [request an immediate check](server-monitoring.md#requesting-an-immediate-check) for that server as soon as | ||
should be none or one) and replaces its description with a default ServerDescription of type "Unknown". Additionally, | ||
the `error` field of the new `ServerDescription` object MUST include a descriptive error explaining that it was | ||
invalidated because the primary was determined to be stale. Drivers MAY additionally specify whether this was due to an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is "a note on invalidating the old primary" so I don't think it applies to all the "electionId/setVersion mismatch" case and this sentence should be moved:
Drivers MAY additionally specify whether this was due to an electionId or setVersion mismatch as described in the ServerDescripion.error section.
It should be moved to the paragraph below that starts with "If the server is primary with an obsolete electionId or setVersion,"
|
||
| circumstance | error substring | | ||
| ---------------------------------------------------------- | -------------------------------------------------------------- | | ||
| RSPrimary with a stale electionId/setVersion is discovered | `'primary marked stale due to electionId/setVersion mismatch'` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this comment #1729 (comment) I had suggested adding new and old election tuples to the error message. Did you decide not to do that? Or could you add an example of the suggested error message here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, just missed that. Here's what the message looks like in the Node driver currently (implemented before we finalized this):
MongoStalePrimaryError: primary marked stale due to electionId/setVersion mismatch: server setVersion: 1, server electionId: 000000000000000000000001, topology setVersion: 1, topology electionId: 000000000000000000000002
After implementing this it could look like
MongoStalePrimaryError: primary marked stale due to electionId/setVersion mismatch: stale electionId/setVersion: (000000000000000000000001,1), new electionId/setVersion: (000000000000000000000002, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just implemented this in Python and found that some of these tests are expecting an incorrect error message:
$ pytest -s -v -k 'new_primary or electionid or setversion' -- test/test_discovery_and_monitoring.py
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_electionId_precedence_setVersion PASSED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_equal_electionids PASSED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_new_primary PASSED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_new_primary_new_electionid FAILED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_new_primary_new_setversion FAILED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_new_primary_wrong_set_name PASSED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_primary_disconnect_electionid FAILED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_primary_disconnect_setversion FAILED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_setversion_equal_max_without_electionid PASSED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_setversion_greaterthan_max_without_electionid FAILED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_setversion_without_electionid PASSED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_setversion_without_electionid-pre-6.0 FAILED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_use_setversion_without_electionid PASSED
test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_use_setversion_without_electionid-pre-6.0 FAILED
FAILED test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_new_primary_new_electionid - AssertionError: phase: 1: 'primary marked stale due to electionId/setVersion mismatch' not found in 'primary marked stale due to discovery of newer...
FAILED test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_new_primary_new_setversion - AssertionError: phase: 1: 'primary marked stale due to electionId/setVersion mismatch' not found in 'primary marked stale due to discovery of newer...
FAILED test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_primary_disconnect_electionid - AssertionError: phase: 0: 'primary marked stale due to electionId/setVersion mismatch' not found in 'primary marked stale due to discovery of newer...
FAILED test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_primary_disconnect_setversion - AssertionError: phase: 0: 'primary marked stale due to electionId/setVersion mismatch' not found in 'primary marked stale due to discovery of newer...
FAILED test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_setversion_greaterthan_max_without_electionid - AssertionError: phase: 1: 'primary marked stale due to electionId/setVersion mismatch' not found in 'primary marked stale due to discovery of newer...
FAILED test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_setversion_without_electionid-pre-6.0 - AssertionError: phase: 1: 'primary marked stale due to electionId/setVersion mismatch' not found in 'primary marked stale due to discovery of newer...
FAILED test/test_discovery_and_monitoring.py::TestAllScenarios::test_rs_use_setversion_without_electionid-pre-6.0 - AssertionError: phase: 1: 'primary marked stale due to electionId/setVersion mismatch' not found in 'primary marked stale due to discovery of newer...
===================================================== 7 failed, 7 passed, 219 deselected in 0.48s ======================================================
@ShaneHarvey updated Node POC and tests mongodb/node-mongodb-native#4397 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tests pass in Python.
Please complete the following before merging:
clusters, and serverless).