Skip to content

Commit

Permalink
Release for azure-cosmos 4.30.1 and adding reason to rntbd channel he…
Browse files Browse the repository at this point in the history
…alth check failures (#29174)

* Release for azure-cosmos 4.30.1 and adding reason to rntbd channel health check failures

* Update CHANGELOG.md

* Update RntbdRequestManager.java

* Reverting release preparation for spring-data-cosmos (API review pending)
  • Loading branch information
FabianMeiswinkel authored Jun 2, 2022
1 parent 43a7777 commit 9129dc0
Show file tree
Hide file tree
Showing 21 changed files with 190 additions and 68 deletions.
4 changes: 2 additions & 2 deletions eng/jacoco-test-coverage/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -178,12 +178,12 @@
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos-encryption</artifactId>
<version>1.3.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
<version>1.2.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
</dependency>
<dependency>
<groupId>com.azure</groupId>
Expand Down
8 changes: 4 additions & 4 deletions eng/versioning/version_client.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,13 @@ com.azure:azure-core-serializer-json-gson;1.1.15;1.2.0-beta.1
com.azure:azure-core-serializer-json-jackson;1.2.16;1.3.0-beta.1
com.azure:azure-core-test;1.8.0;1.9.0-beta.1
com.azure:azure-core-tracing-opentelemetry;1.0.0-beta.23;1.0.0-beta.24
com.azure:azure-cosmos;4.30.0;4.31.0-beta.1
com.azure:azure-cosmos;4.30.0;4.30.1
com.azure:azure-cosmos-benchmark;4.0.1-beta.1;4.0.1-beta.1
com.azure:azure-cosmos-dotnet-benchmark;4.0.1-beta.1;4.0.1-beta.1
com.azure.cosmos.spark:azure-cosmos-spark_3_2-12;1.0.0-beta.1;1.0.0-beta.1
com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12;4.10.0;4.11.0-beta.1
com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12;4.10.0;4.11.0-beta.1
com.azure:azure-cosmos-encryption;1.2.0;1.3.0-beta.1
com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12;4.10.0;4.10.1
com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12;4.10.0;4.10.1
com.azure:azure-cosmos-encryption;1.2.0;1.2.1
com.azure:azure-data-appconfiguration;1.3.3;1.4.0-beta.1
com.azure:azure-data-appconfiguration-perf;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-data-schemaregistry;1.2.0;1.3.0-beta.1
Expand Down
4 changes: 2 additions & 2 deletions sdk/cosmos/azure-cosmos-benchmark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,13 @@ Licensed under the MIT License.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>

<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos-encryption</artifactId>
<version>1.3.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
<version>1.2.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
</dependency>

<dependency>
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos-dotnet-benchmark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Licensed under the MIT License.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>

<dependency>
Expand Down
10 changes: 2 additions & 8 deletions sdk/cosmos/azure-cosmos-encryption/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
## Release History

### 1.3.0-beta.1 (Unreleased)

#### Features Added

#### Breaking Changes

#### Bugs Fixed

### 1.2.1 (2022-06-01)
#### Other Changes
* Updated `azure-cosmos` to version `4.30.1`.

### 1.2.0 (2022-05-20)
#### Other Changes
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos-encryption/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The Azure Cosmos Encryption Plugin is used for encrypting data with a user-provi
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos-encryption</artifactId>
<version>1.2.0</version>
<version>1.2.1</version>
</dependency>
```
[//]: # ({x-version-update-end})
Expand Down
4 changes: 2 additions & 2 deletions sdk/cosmos/azure-cosmos-encryption/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Licensed under the MIT License.

<groupId>com.azure</groupId>
<artifactId>azure-cosmos-encryption</artifactId>
<version>1.3.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
<version>1.2.1</version> <!-- {x-version-update;com.azure:azure-cosmos-encryption;current} -->
<name>Encryption Plugin for Azure Cosmos DB SDK</name>
<description>This Package contains Encryption Plugin for Microsoft Azure Cosmos SDK</description>
<packaging>jar</packaging>
Expand Down Expand Up @@ -56,7 +56,7 @@ Licensed under the MIT License.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>

<dependency>
Expand Down
6 changes: 1 addition & 5 deletions sdk/cosmos/azure-cosmos-spark_3-1_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
## Release History

### 4.11.0-beta.1 (Unreleased)
### 4.10.1 (2022-06-01)

#### Features Added
* Added ability to disable endpoint rediscovery when using custom domain names in combination with private endpoints from a custom (on-premise) Spark environment (neither Databricks nor Synapse). - See [PR 29027](https://github.com/Azure/azure-sdk-for-java/pull/29027)
* Added a config option `spark.cosmos.serialization.dateTimeConversionMode` to allow changing date/time conversion to fall back to converting `java.sql.Date` and `java.sql.Tiemstamp` into Epoch Milliseconds like in the Cosmos DB Connector for Spark 2.4 - See [PR 29081](https://github.com/Azure/azure-sdk-for-java/pull/29081)

#### Breaking Changes

#### Bugs Fixed
* Fixed possible perf issue when Split results in 410 when trying to get latest LSN in Spark partitioner that could result in reprocessing change feed events (causing "hot partition2") - See [PR 29152](https://github.com/Azure/azure-sdk-for-java/pull/29152)
* Fixed a bug resulting in ChangeFeed requests using the account's default consistency model instead of falling back to eventual if `spark.cosmos.read.forceEventualConsistency` is `true` (the default config). - See [PR 29152](https://github.com/Azure/azure-sdk-for-java/pull/29152)

#### Other Changes

### 4.10.0 (2022-05-20)
#### Features Added
* Added the ability to change the target throughput control (`spark.cosmos.throughputControl.targetThroughputThreshold` or `spark.cosmos.throughputControl.targetThroughput`) when throughput control is enabled without having to also change the throughput control group name (`spark.cosmos.throughputControl.name`). - See [PR 28969](https://github.com/Azure/azure-sdk-for-java/pull/28969)
Expand Down
4 changes: 2 additions & 2 deletions sdk/cosmos/azure-cosmos-spark_3-1_2-12/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
</parent>
<groupId>com.azure.cosmos.spark</groupId>
<artifactId>azure-cosmos-spark_3-1_2-12</artifactId>
<version>4.11.0-beta.1</version> <!-- {x-version-update;com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12;current} -->
<version>4.10.1</version> <!-- {x-version-update;com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12;current} -->
<packaging>jar</packaging>
<url>https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3-1_2-12</url>
<name>OLTP Spark 3.1 Connector for Azure Cosmos DB SQL API</name>
Expand Down Expand Up @@ -106,7 +106,7 @@
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
Expand Down
6 changes: 1 addition & 5 deletions sdk/cosmos/azure-cosmos-spark_3-2_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
## Release History

### 4.11.0-beta.1 (Unreleased)
### 4.10.1 (2022-06-01)

#### Features Added
* Added ability to disable endpoint rediscovery when using custom domain names in combination with private endpoints from a custom (on-premise) Spark environment (neither Databricks nor Synapse). - See [PR 29027](https://github.com/Azure/azure-sdk-for-java/pull/29027)
* Added a config option `spark.cosmos.serialization.dateTimeConversionMode` to allow changing date/time conversion to fall back to converting `java.sql.Date` and `java.sql.Tiemstamp` into Epoch Milliseconds like in the Cosmos DB Connector for Spark 2.4 - See [PR 29081](https://github.com/Azure/azure-sdk-for-java/pull/29081)

#### Breaking Changes

#### Bugs Fixed
* Fixed possible perf issue when Split results in 410 when trying to get latest LSN in Spark partitioner that could result in reprocessing change feed events (causing "hot partition2") - See [PR 29152](https://github.com/Azure/azure-sdk-for-java/pull/29152)
* Fixed a bug resulting in ChangeFeed requests using the account's default consistency model instead of falling back to eventual if `spark.cosmos.read.forceEventualConsistency` is `true` (the default config). - See [PR 29152](https://github.com/Azure/azure-sdk-for-java/pull/29152)

#### Other Changes

### 4.10.0 (2022-05-20)
#### Features Added
* Added the ability to change the target throughput control (`spark.cosmos.throughputControl.targetThroughputThreshold` or `spark.cosmos.throughputControl.targetThroughput`) when throughput control is enabled without having to also change the throughput control group name (`spark.cosmos.throughputControl.name`). - See [PR 28969](https://github.com/Azure/azure-sdk-for-java/pull/28969)
Expand Down
4 changes: 2 additions & 2 deletions sdk/cosmos/azure-cosmos-spark_3-2_2-12/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
</parent>
<groupId>com.azure.cosmos.spark</groupId>
<artifactId>azure-cosmos-spark_3-2_2-12</artifactId>
<version>4.11.0-beta.1</version> <!-- {x-version-update;com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12;current} -->
<version>4.10.1</version> <!-- {x-version-update;com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12;current} -->
<packaging>jar</packaging>
<url>https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3-2_2-12</url>
<name>OLTP Spark 3.2 Connector for Azure Cosmos DB SQL API</name>
Expand Down Expand Up @@ -108,7 +108,7 @@
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos-spark_3_2-12/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
Expand Down
12 changes: 4 additions & 8 deletions sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,11 @@
## Release History

### 4.31.0-beta.1 (Unreleased)

#### Features Added

#### Breaking Changes

#### Bugs Fixed
### 4.30.1 (2022-06-01)

#### Other Changes
* Making CosmosPatchOperations thread-safe. Usually there is no reason to modify a CosmosPatchOperations instance concurrently form multiple threads - but making it thread-safe acts as protection in case this is done anyway - See [PR 29143](https://github.com/Azure/azure-sdk-for-java/pull/29143)
* Made CosmosPatchOperations thread-safe. Usually there is no reason to modify a CosmosPatchOperations instance concurrently form multiple threads - but making it thread-safe acts as protection in case this is done anyway - See [PR 29143](https://github.com/Azure/azure-sdk-for-java/pull/29143)
* Added system property to allow overriding proxy setting for client telemetry endpoint. - See [PR 29022](https://github.com/Azure/azure-sdk-for-java/pull/29022)
* Added additional information about the reason on Rntbd channel health check failures. - See [PR 29022](https://github.com/Azure/azure-sdk-for-java/pull/29022)

### 4.30.0 (2022-05-20)
#### Bugs Fixed
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ add the direct dependency to your project as follows.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.30.0</version>
<version>4.30.1</version>
</dependency>
```
[//]: # ({x-version-update-end})
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Licensed under the MIT License.

<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>4.31.0-beta.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<version>4.30.1</version> <!-- {x-version-update;com.azure:azure-cosmos;current} -->
<name>Microsoft Azure SDK for SQL API of Azure Cosmos DB Service</name>
<description>This Package contains Microsoft Azure Cosmos SDK (with Reactive Extension Reactor support) for Azure Cosmos DB SQL API</description>
<packaging>jar</packaging>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.text.MessageFormat;
import java.util.Optional;
import java.util.concurrent.atomic.AtomicLongFieldUpdater;

Expand Down Expand Up @@ -126,7 +127,6 @@ public long writeDelayLimitInNanos() {
* @return A future with a result of {@code true} if the channel is healthy, or {@code false} otherwise.
*/
public Future<Boolean> isHealthy(final Channel channel) {

checkNotNull(channel, "expected non-null channel");

final RntbdRequestManager requestManager = channel.pipeline().get(RntbdRequestManager.class);
Expand Down Expand Up @@ -181,7 +181,7 @@ public Future<Boolean> isHealthy(final Channel channel) {
final int pendingRequestCount = requestManager.pendingRequestCount();

logger.warn("{} health check failed due to nonresponding read: {lastChannelWrite: {}, lastChannelRead: {}, "
+ "readDelay: {}, readDelayLimit: {}, rntbdContext: {}, pendingRequestCount: {}}", channel,
+ "readDelay: {}, readDelayLimit: {}, rntbdContext: {}, pendingRequestCount: {}}", channel,
timestamps.lastChannelWriteNanoTime(), timestamps.lastChannelReadNanoTime(), readDelay,
this.readDelayLimitInNanos, rntbdContext, pendingRequestCount);

Expand All @@ -206,6 +206,122 @@ public Future<Boolean> isHealthy(final Channel channel) {
return promise;
}

/**
* Determines whether a specified channel is healthy.
*
* @param channel A channel whose health is to be checked.
* @return A future with a result of {@code true} if the channel is healthy, or {@code false} otherwise.
*/
public Future<String> isHealthyWithFailureReason(final Channel channel) {

checkNotNull(channel, "expected non-null channel");

final RntbdRequestManager requestManager = channel.pipeline().get(RntbdRequestManager.class);
final Promise<String> promise = channel.eventLoop().newPromise();

if (requestManager == null) {
reportIssueUnless(logger, !channel.isActive(), channel, "active with no request manager");
return promise.setSuccess("active with no request manager");
}

final Timestamps timestamps = requestManager.snapshotTimestamps();
final long currentTime = System.nanoTime();

if (currentTime - timestamps.lastChannelReadNanoTime() < recentReadWindowInNanos) {
// because we recently received data
return promise.setSuccess(RntbdConstants.RntbdHealthCheckResults.SuccessValue);
}

// Black hole detection, part 1:
// Treat the channel as unhealthy if the gap between the last attempted write and the last successful write
// grew beyond acceptable limits, unless a write was attempted recently. This is a sign of a nonresponding write.

final long writeDelayInNanos =
timestamps.lastChannelWriteAttemptNanoTime() - timestamps.lastChannelWriteNanoTime();

final long writeHangDurationInNanos =
currentTime - timestamps.lastChannelWriteAttemptNanoTime();

if (writeDelayInNanos > this.writeDelayLimitInNanos && writeHangDurationInNanos > writeHangGracePeriodInNanos) {

final Optional<RntbdContext> rntbdContext = requestManager.rntbdContext();
final int pendingRequestCount = requestManager.pendingRequestCount();

logger.warn("{} health check failed due to nonresponding write: {lastChannelWriteAttemptNanoTime: {}, " +
"lastChannelWriteNanoTime: {}, writeDelayInNanos: {}, writeDelayLimitInNanos: {}, " +
"rntbdContext: {}, pendingRequestCount: {}}",
channel, timestamps.lastChannelWriteAttemptNanoTime(), timestamps.lastChannelWriteNanoTime(),
writeDelayInNanos, this.writeDelayLimitInNanos, rntbdContext, pendingRequestCount);

String msg = MessageFormat.format(
"{0} health check failed due to nonresponding write: (lastChannelWriteAttemptNanoTime: {1}, " +
"lastChannelWriteNanoTime: {2}, writeDelayInNanos: {3}, writeDelayLimitInNanos: {4}, " +
"rntbdContext: {5}, pendingRequestCount: {6})",
channel, timestamps.lastChannelWriteAttemptNanoTime(), timestamps.lastChannelWriteNanoTime(),
writeDelayInNanos, this.writeDelayLimitInNanos, rntbdContext, pendingRequestCount
);

return promise.setSuccess(msg);
}

// Black hole detection, part 2:
// Treat the connection as unhealthy if the gap between the last successful write and the last successful read
// grew beyond acceptable limits, unless a write succeeded recently. This is a sign of a nonresponding read.

final long readDelay = timestamps.lastChannelWriteNanoTime() - timestamps.lastChannelReadNanoTime();
final long readHangDuration = currentTime - timestamps.lastChannelWriteNanoTime();

if (readDelay > this.readDelayLimitInNanos && readHangDuration > readHangGracePeriodInNanos) {

final Optional<RntbdContext> rntbdContext = requestManager.rntbdContext();
final int pendingRequestCount = requestManager.pendingRequestCount();

logger.warn("{} health check failed due to nonresponding read: {lastChannelWrite: {}, lastChannelRead: {}, "
+ "readDelay: {}, readDelayLimit: {}, rntbdContext: {}, pendingRequestCount: {}}", channel,
timestamps.lastChannelWriteNanoTime(), timestamps.lastChannelReadNanoTime(), readDelay,
this.readDelayLimitInNanos, rntbdContext, pendingRequestCount);

String msg = MessageFormat.format(
"{0} health check failed due to nonresponding read: (lastChannelWrite: {1}, lastChannelRead: {2}, "
+ "readDelay: {3}, readDelayLimit: {4}, rntbdContext: {5}, pendingRequestCount: {6})", channel,
timestamps.lastChannelWriteNanoTime(), timestamps.lastChannelReadNanoTime(), readDelay,
this.readDelayLimitInNanos, rntbdContext, pendingRequestCount
);

return promise.setSuccess(msg);
}

if (this.idleConnectionTimeoutInNanos > 0L) {
if (currentTime - timestamps.lastChannelReadNanoTime() > this.idleConnectionTimeoutInNanos) {
String msg = MessageFormat.format(
"{0} health check failed due to idle connection timeout: (lastChannelWrite: {1}, lastChannelRead: {2}, "
+ "idleConnectionTimeout: {3}, currentTime: {4}", channel,
timestamps.lastChannelWriteNanoTime(), timestamps.lastChannelReadNanoTime(),
idleConnectionTimeoutInNanos, currentTime
);
return promise.setSuccess(msg);
}
}

channel.writeAndFlush(RntbdHealthCheckRequest.MESSAGE).addListener(completed -> {
if (completed.isSuccess()) {
promise.setSuccess(RntbdConstants.RntbdHealthCheckResults.SuccessValue);
} else {
logger.warn("{} health check request failed due to:", channel, completed.cause());

String msg = MessageFormat.format(
"{0} health check request failed due to: {1}",
channel,
completed.cause().toString()
);

promise.setSuccess(msg);
}
});

return promise;
}

@Override
public String toString() {
return RntbdObjectMapper.toString(this);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ public final class RntbdConstants {
private RntbdConstants() {
}

public static class RntbdHealthCheckResults {
public static final String SuccessValue = "Success";
}

public enum RntbdConsistencyLevel {

Strong((byte) 0x00),
Expand Down
Loading

0 comments on commit 9129dc0

Please sign in to comment.