-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FaultInjection-Direct #33329
Merged
Merged
FaultInjection-Direct #33329
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ghost
added
the
Cosmos
label
Feb 6, 2023
xinlian12
force-pushed
the
faultInjection
branch
from
February 7, 2023 17:56
dcfd64f
to
c09ccd3
Compare
xinlian12
force-pushed
the
faultInjection
branch
from
February 28, 2023 15:46
e243707
to
8844eb3
Compare
xinlian12
requested review from
kushagraThapar,
FabianMeiswinkel,
kirankumarkolli,
milismsft,
aayush3011,
simorenoh,
jeet1995 and
Pilchie
as code owners
February 28, 2023 17:16
xinlian12
commented
Feb 28, 2023
.../main/java/com/azure/cosmos/implementation/directconnectivity/rntbd/RntbdRequestManager.java
Outdated
Show resolved
Hide resolved
xinlian12
requested review from
alzimmermsft,
samvaity,
g2vinay and
JimSuplizio
as code owners
March 7, 2023 16:55
xinlian12
force-pushed
the
faultInjection
branch
from
March 7, 2023 17:10
6b2a23a
to
80a89f5
Compare
xinlian12
commented
Mar 7, 2023
xinlian12
commented
Mar 7, 2023
xinlian12
commented
Mar 7, 2023
...mos-test/src/main/java/com/azure/cosmos/test/implementation/ImplementationBridgeHelpers.java
Show resolved
Hide resolved
xinlian12
force-pushed
the
faultInjection
branch
from
March 7, 2023 21:07
7daefa1
to
ccfe273
Compare
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
xinlian12
force-pushed
the
faultInjection
branch
from
March 7, 2023 23:47
3d63260
to
d3582b4
Compare
eng/code-quality-reports/src/main/resources/spotbugs/spotbugs-exclude.xml
Show resolved
Hide resolved
...smos-test/src/main/java/com/azure/cosmos/test/faultinjection/CosmosFaultInjectionHelper.java
Show resolved
Hide resolved
...mos-test/src/main/java/com/azure/cosmos/test/faultinjection/FaultInjectionOperationType.java
Show resolved
Hide resolved
xinlian12
force-pushed
the
faultInjection
branch
from
March 8, 2023 01:58
f293c12
to
60a8cd5
Compare
FabianMeiswinkel
approved these changes
Mar 8, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - looks amazing now - thanks!
xinlian12
force-pushed
the
faultInjection
branch
from
March 8, 2023 02:37
60a8cd5
to
c6d42bc
Compare
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
clientTelemetryWithStageJunoEndpoint: Known failed test case |
/check-enforcer override |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Added capability for fault injection in rntbd layer
Design
High level
Customer can config fault injection behaviors by creating
FaultInjectionRule
. Each rule contains major three components:FaultInjectionServerErrorResult
:Supported server error type:
times
delay
SERVER_RESPONSE_DELAY
,SERVER_CONNECTION_DELAY
FaultInjectionConnectionErrorResult
:Supported connection error type:
interval
threshold
OperationType
Null
Optional
. ForFaultInjectionServerErrorResult
, forSERVER_GONE
,SERVER_CONNECTION_DELAY
, ignore after resolve addresses. ForFaultInjectionConnectionErrorResult
, ignore after resolve addressesConnectionType
FaultInjectionConnectionType.Direct
Region
Null
Optional
. If not defined, the rule will apply in all available regionsEndpoints
Null
Optional
. Type ofFaultInjectionEndpoint
. Use when you want to filter down to a subset of physical addressesid
duration
startDelay
hitLimit
Add fault injection rules
After config the fault injection rule successfully, regionEndpoint and addresses detail can be obtained by:
Disable fault injection rule
Get hit count of the fault injection rule
CosmosDiagnostics
New
faultInjectionRuleId
:New
lastFaultInjectionRuleId
,lastFaultInjectionTimestamp
inserviceEndpointStatistics
:Testing scenario examples
High channel acquisition/Connection timeout scenario
Broken connections scenario
Server return gone exception scenario
Random connection closing/reset scenario
Implementation
aure-cosmos-test module
All the public API and new models mentioned above will be added into it is own azure-cosmos-test module. if customer want to use fault inject, they will need to add the following in their pom file:
How ServerError is injected
RntbdTransportClient
->RntbdServiceEndpoint.Provider
-> [0, *)RntbdServiceEndpoint
-> [0, maxChannelsPerEndpoint]Connections/Channels
->RntbdRequestManager
channelHandler ->RntbdServerErrorInjector
.SERVER_CONNECTION_DELAY
- Inject during new connection establishment stage. Instead of opening connections right away, add delay and then reduce connectionTimeout based on the delaySERVER_RESPONSE_DELAY
- Inject delay after getting server responses - TBDHow ConnectionError is injected
RntbdTransportClient
->RntbdConnectionErrorInjector
which will schedule a side task to create chaos(close/reset) based on the interval and threshold defined in the ruleTo Be Discussed
Parent feature #33425