From ba142d358747d90250a77cb56f11e94095d31b37 Mon Sep 17 00:00:00 2001 From: Remi Dettai Date: Wed, 8 Jan 2025 14:39:41 +0100 Subject: [PATCH 1/5] Document commit timeout and shutdown --- docs/configuration/index-config.md | 8 ++++++++ docs/configuration/node-config.md | 2 +- docs/deployment/deployment-modes.md | 2 +- docs/distributed-tracing/otel-service.md | 2 +- docs/log-management/otel-service.md | 2 +- 5 files changed, 12 insertions(+), 4 deletions(-) diff --git a/docs/configuration/index-config.md b/docs/configuration/index-config.md index 5dbc7db8ade..270bddcc482 100644 --- a/docs/configuration/index-config.md +++ b/docs/configuration/index-config.md @@ -584,6 +584,14 @@ This section describes indexing settings for a given index. | `docstore_compression_level` | Level of compression used by zstd for the docstore. Lower values may increase ingest speed, at the cost of index size | `8` | | `docstore_blocksize` | Size of blocks in the docstore, in bytes. Lower values may improve doc retrieval speed, at the cost of index size | `1000000` | +:::note + +Choosing an appropriate commit timeout is critical. With a shorter commit timeout, ingested data is more quickly queryable. But the published splits will be smaller, increasing the overhead associated with [merges](#merge-policies). + +When decommissioning definitively a indexer node that received data through the ingest API (including the [Elastic bulk API](/docs/reference/es_compatible_api) and the OTEL [log](/docs/log-management/otel-service.md) and [trace](/docs/distributed-tracing/otel-service.md) services), we need to make sure that all the data that was staged locally (Write Ahead Log) is indexed. After receiving the termination signal, the Quickwit process waits for the local indexing pipelines to complete. This can take as long as the longest commit timeout of all indexes. Make sure that the termination grace period of the infrastructure supporting the Quickwit indexer nodes is long enough (e.g [`terminationGracePeriodSeconds`](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) in Kubernetes or [`stopTimeout`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html) on AWS ECS). + +::: + ### Merge policies Quickwit makes it possible to define the strategy used to decide which splits should be merged together and when. diff --git a/docs/configuration/node-config.md b/docs/configuration/node-config.md index d736f03c585..6a74db81fe5 100644 --- a/docs/configuration/node-config.md +++ b/docs/configuration/node-config.md @@ -84,7 +84,7 @@ grpc: :::warning We advise changing the default value of 20 MiB only if you encounter the following error: -`Error, message length too large: found 24732228 bytes, the limit is: 20971520 bytes.` In that case, increase `max_message_size` by increments of 10 MiB until the issue disappears. This is a temporary fix: the next version of Quickwit, 0.8, will rely exclusively on gRPC streaming endpoints and handle messages of any length. +`Error, message length too large: found 24732228 bytes, the limit is: 20971520 bytes.` In that case, increase `max_message_size` by increments of 10 MiB until the issue disappears. This is a temporary fix: the next version of Quickwit will rely exclusively on gRPC streaming endpoints and handle messages of any length. ::: ## Storage configuration diff --git a/docs/deployment/deployment-modes.md b/docs/deployment/deployment-modes.md index 52c82f12848..e2407479914 100644 --- a/docs/deployment/deployment-modes.md +++ b/docs/deployment/deployment-modes.md @@ -28,7 +28,7 @@ One indexer running on a small instance (4 vCPUs) can ingest documents at a thro ## Multiple indexers, multiple searchers Indexing a single [data source](../configuration/source-config.md) on several indexers is only possible with a [Kafka source](../configuration/source-config.md#kafka-source). -Support for native distributed indexing is planned for Quickwit 0.8 (Q2). Stay tuned! +Support for native distributed indexing was added with Quickwit 0.9. ## File-backed metastore limitations diff --git a/docs/distributed-tracing/otel-service.md b/docs/distributed-tracing/otel-service.md index 428b6523842..b31f25163df 100644 --- a/docs/distributed-tracing/otel-service.md +++ b/docs/distributed-tracing/otel-service.md @@ -145,7 +145,7 @@ search_settings: ## Known limitations There are a few limitations on the current distributed tracing setup in Quickwit 0.7: -- The OTLP gRPC service does not provide High-Availability and High-Durability, This will be fixed in 0.8. +- The OTLP gRPC service does not provide High-Availability and High-Durability. This will be fixed in 0.9. - OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. If you are interested in new features or discovered other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit). diff --git a/docs/log-management/otel-service.md b/docs/log-management/otel-service.md index 6a0938fc924..d0ff21b2e8d 100644 --- a/docs/log-management/otel-service.md +++ b/docs/log-management/otel-service.md @@ -119,7 +119,7 @@ You can also send traces to Quickwit that you can visualize in Jaeger UI, as exp ## Known limitations There are a few limitations on the log management setup in Quickwit 0.7: -- The ingest API does not provide High-Availability and High-Durability, this will be fixed in 0.8. +- The ingest API does not provide High-Availability and High-Durability. This will be fixed in 0.9. - OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. If you are interested in new features or discover other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit). From fb930dc4cab6ba0a8d01c08c1d53e32f2c95dd5f Mon Sep 17 00:00:00 2001 From: Remi Dettai Date: Wed, 8 Jan 2025 14:39:41 +0100 Subject: [PATCH 2/5] Increase the grace period on ECS --- distribution/ecs/quickwit/quickwit-indexer.tf | 3 +++ distribution/ecs/quickwit/service/ecs.tf | 2 ++ distribution/ecs/quickwit/service/variables.tf | 5 +++++ 3 files changed, 10 insertions(+) diff --git a/distribution/ecs/quickwit/quickwit-indexer.tf b/distribution/ecs/quickwit/quickwit-indexer.tf index 441a1c7a8f8..ab3ca76a199 100644 --- a/distribution/ecs/quickwit/quickwit-indexer.tf +++ b/distribution/ecs/quickwit/quickwit-indexer.tf @@ -20,6 +20,9 @@ module "quickwit_indexer" { enable_cloudwatch_logging = var.enable_cloudwatch_logging service_config = var.quickwit_indexer quickwit_index_s3_prefix = local.quickwit_index_s3_prefix + # Longer termination grace period for indexers because their ingest services + # need to commit their WALs. Should be larger than the largest commit timeout. + stop_timeout = 120 } resource "aws_service_discovery_service" "indexer" { diff --git a/distribution/ecs/quickwit/service/ecs.tf b/distribution/ecs/quickwit/service/ecs.tf index 5b862271f77..03e299f4dcd 100644 --- a/distribution/ecs/quickwit/service/ecs.tf +++ b/distribution/ecs/quickwit/service/ecs.tf @@ -63,6 +63,8 @@ module "quickwit_service" { } ] + stopTimeout = var.stop_timeout + dependencies = var.sidecar_container_dependencies } }) diff --git a/distribution/ecs/quickwit/service/variables.tf b/distribution/ecs/quickwit/service/variables.tf index 09de61ff3ee..61404efd1bb 100644 --- a/distribution/ecs/quickwit/service/variables.tf +++ b/distribution/ecs/quickwit/service/variables.tf @@ -61,3 +61,8 @@ variable "task_execution_policy_arn" {} variable "quickwit_cpu_architecture" {} variable "module_id" {} + +variable "stop_timeout" { + # between 1s and 120s on Fargate, 30s is the ECS default + default = 30 +} From 987f0c3e7f6c85f746c8da13945032d2cde1db7b Mon Sep 17 00:00:00 2001 From: Remi Dettai Date: Wed, 8 Jan 2025 14:39:41 +0100 Subject: [PATCH 3/5] Clarify what the shutdown needs to wait for --- distribution/ecs/quickwit/quickwit-indexer.tf | 5 +++-- docs/configuration/index-config.md | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/distribution/ecs/quickwit/quickwit-indexer.tf b/distribution/ecs/quickwit/quickwit-indexer.tf index ab3ca76a199..aea74b30aaa 100644 --- a/distribution/ecs/quickwit/quickwit-indexer.tf +++ b/distribution/ecs/quickwit/quickwit-indexer.tf @@ -20,8 +20,9 @@ module "quickwit_indexer" { enable_cloudwatch_logging = var.enable_cloudwatch_logging service_config = var.quickwit_indexer quickwit_index_s3_prefix = local.quickwit_index_s3_prefix - # Longer termination grace period for indexers because their ingest services - # need to commit their WALs. Should be larger than the largest commit timeout. + # Longer termination grace period for indexers because we are waiting for the + # data persisted in the ingesters to be indexed and committed. Should be + # larger than the largest commit timeout. stop_timeout = 120 } diff --git a/docs/configuration/index-config.md b/docs/configuration/index-config.md index 270bddcc482..e7376aaf4bc 100644 --- a/docs/configuration/index-config.md +++ b/docs/configuration/index-config.md @@ -586,9 +586,9 @@ This section describes indexing settings for a given index. :::note -Choosing an appropriate commit timeout is critical. With a shorter commit timeout, ingested data is more quickly queryable. But the published splits will be smaller, increasing the overhead associated with [merges](#merge-policies). +Choosing an appropriate commit timeout is critical. With a shorter commit timeout, ingested data is queryable faster. But the published splits will be smaller, increasing the overhead associated with [merges](#merge-policies). -When decommissioning definitively a indexer node that received data through the ingest API (including the [Elastic bulk API](/docs/reference/es_compatible_api) and the OTEL [log](/docs/log-management/otel-service.md) and [trace](/docs/distributed-tracing/otel-service.md) services), we need to make sure that all the data that was staged locally (Write Ahead Log) is indexed. After receiving the termination signal, the Quickwit process waits for the local indexing pipelines to complete. This can take as long as the longest commit timeout of all indexes. Make sure that the termination grace period of the infrastructure supporting the Quickwit indexer nodes is long enough (e.g [`terminationGracePeriodSeconds`](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) in Kubernetes or [`stopTimeout`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html) on AWS ECS). +When decommissioning definitively an indexer node that received data through the ingest API (including the [Elastic bulk API](/docs/reference/es_compatible_api) and the OTEL [log](/docs/log-management/otel-service.md) and [trace](/docs/distributed-tracing/otel-service.md) services), we need to make sure that all the data that was persisted locally (Write Ahead Log) is indexed and committed. After receiving the termination signal, the Quickwit process waits for the indexing pipelines to finish processing this local data. This can take as long as the longest commit timeout of all indexes. Make sure that the termination grace period of the infrastructure supporting the Quickwit indexer nodes is long enough (e.g [`terminationGracePeriodSeconds`](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) in Kubernetes or [`stopTimeout`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html) on AWS ECS). ::: From 9e9bcc79e0c091ef364f71bbddeb0209749820dd Mon Sep 17 00:00:00 2001 From: Remi Dettai Date: Wed, 8 Jan 2025 14:39:41 +0100 Subject: [PATCH 4/5] Fix version numbers --- docs/distributed-tracing/otel-service.md | 4 ++-- docs/log-management/otel-service.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/distributed-tracing/otel-service.md b/docs/distributed-tracing/otel-service.md index b31f25163df..169429f2f7a 100644 --- a/docs/distributed-tracing/otel-service.md +++ b/docs/distributed-tracing/otel-service.md @@ -144,8 +144,8 @@ search_settings: ## Known limitations -There are a few limitations on the current distributed tracing setup in Quickwit 0.7: -- The OTLP gRPC service does not provide High-Availability and High-Durability. This will be fixed in 0.9. +There are a few limitations on the current distributed tracing setup in Quickwit 0.9: +- The OTLP gRPC service does not provide High-Availability and High-Durability. This will be fixed in 0.10. - OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. If you are interested in new features or discovered other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit). diff --git a/docs/log-management/otel-service.md b/docs/log-management/otel-service.md index d0ff21b2e8d..f7fc928f81c 100644 --- a/docs/log-management/otel-service.md +++ b/docs/log-management/otel-service.md @@ -118,8 +118,8 @@ You can also send traces to Quickwit that you can visualize in Jaeger UI, as exp ## Known limitations -There are a few limitations on the log management setup in Quickwit 0.7: -- The ingest API does not provide High-Availability and High-Durability. This will be fixed in 0.9. +There are a few limitations on the log management setup in Quickwit 0.9: +- The ingest API does not provide High-Availability and High-Durability. This will be fixed in 0.10. - OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. If you are interested in new features or discover other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit). From 2907038366299199b65f83dbb2dd92ec743d5676 Mon Sep 17 00:00:00 2001 From: Remi Dettai Date: Wed, 8 Jan 2025 14:39:42 +0100 Subject: [PATCH 5/5] Remove mention to HA --- docs/distributed-tracing/otel-service.md | 4 ++-- docs/log-management/otel-service.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/distributed-tracing/otel-service.md b/docs/distributed-tracing/otel-service.md index 169429f2f7a..3d97230e8c4 100644 --- a/docs/distributed-tracing/otel-service.md +++ b/docs/distributed-tracing/otel-service.md @@ -145,7 +145,7 @@ search_settings: ## Known limitations There are a few limitations on the current distributed tracing setup in Quickwit 0.9: -- The OTLP gRPC service does not provide High-Availability and High-Durability. This will be fixed in 0.10. -- OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. +- The OTLP gRPC service does not provide High-Durability. This will be fixed in 0.10. +- OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. If you are interested in new features or discovered other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit). diff --git a/docs/log-management/otel-service.md b/docs/log-management/otel-service.md index f7fc928f81c..5cd38b852d3 100644 --- a/docs/log-management/otel-service.md +++ b/docs/log-management/otel-service.md @@ -119,7 +119,7 @@ You can also send traces to Quickwit that you can visualize in Jaeger UI, as exp ## Known limitations There are a few limitations on the log management setup in Quickwit 0.9: -- The ingest API does not provide High-Availability and High-Durability. This will be fixed in 0.10. +- The ingest API does not provide High-Durability. This will be fixed in 0.10. - OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. If you are interested in new features or discover other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit).