Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document commit timeout and shutdown #5418

Merged
merged 5 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions distribution/ecs/quickwit/quickwit-indexer.tf
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ module "quickwit_indexer" {
enable_cloudwatch_logging = var.enable_cloudwatch_logging
service_config = var.quickwit_indexer
quickwit_index_s3_prefix = local.quickwit_index_s3_prefix
# Longer termination grace period for indexers because we are waiting for the
# data persisted in the ingesters to be indexed and committed. Should be
# larger than the largest commit timeout.
stop_timeout = 120
}

resource "aws_service_discovery_service" "indexer" {
Expand Down
2 changes: 2 additions & 0 deletions distribution/ecs/quickwit/service/ecs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ module "quickwit_service" {
}
]

stopTimeout = var.stop_timeout

dependencies = var.sidecar_container_dependencies
}
})
Expand Down
5 changes: 5 additions & 0 deletions distribution/ecs/quickwit/service/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,8 @@ variable "task_execution_policy_arn" {}
variable "quickwit_cpu_architecture" {}

variable "module_id" {}

variable "stop_timeout" {
# between 1s and 120s on Fargate, 30s is the ECS default
default = 30
}
8 changes: 8 additions & 0 deletions docs/configuration/index-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,14 @@ This section describes indexing settings for a given index.
| `docstore_compression_level` | Level of compression used by zstd for the docstore. Lower values may increase ingest speed, at the cost of index size | `8` |
| `docstore_blocksize` | Size of blocks in the docstore, in bytes. Lower values may improve doc retrieval speed, at the cost of index size | `1000000` |

:::note

Choosing an appropriate commit timeout is critical. With a shorter commit timeout, ingested data is queryable faster. But the published splits will be smaller, increasing the overhead associated with [merges](#merge-policies).

When decommissioning definitively an indexer node that received data through the ingest API (including the [Elastic bulk API](/docs/reference/es_compatible_api) and the OTEL [log](/docs/log-management/otel-service.md) and [trace](/docs/distributed-tracing/otel-service.md) services), we need to make sure that all the data that was persisted locally (Write Ahead Log) is indexed and committed. After receiving the termination signal, the Quickwit process waits for the indexing pipelines to finish processing this local data. This can take as long as the longest commit timeout of all indexes. Make sure that the termination grace period of the infrastructure supporting the Quickwit indexer nodes is long enough (e.g [`terminationGracePeriodSeconds`](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) in Kubernetes or [`stopTimeout`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html) on AWS ECS).

:::

### Merge policies

Quickwit makes it possible to define the strategy used to decide which splits should be merged together and when.
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration/node-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ grpc:

:::warning
We advise changing the default value of 20 MiB only if you encounter the following error:
`Error, message length too large: found 24732228 bytes, the limit is: 20971520 bytes.` In that case, increase `max_message_size` by increments of 10 MiB until the issue disappears. This is a temporary fix: the next version of Quickwit, 0.8, will rely exclusively on gRPC streaming endpoints and handle messages of any length.
`Error, message length too large: found 24732228 bytes, the limit is: 20971520 bytes.` In that case, increase `max_message_size` by increments of 10 MiB until the issue disappears. This is a temporary fix: the next version of Quickwit will rely exclusively on gRPC streaming endpoints and handle messages of any length.
:::

## Storage configuration
Expand Down
2 changes: 1 addition & 1 deletion docs/deployment/deployment-modes.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ One indexer running on a small instance (4 vCPUs) can ingest documents at a thro
## Multiple indexers, multiple searchers

Indexing a single [data source](../configuration/source-config.md) on several indexers is only possible with a [Kafka source](../configuration/source-config.md#kafka-source).
Support for native distributed indexing is planned for Quickwit 0.8 (Q2). Stay tuned!
Support for native distributed indexing was added with Quickwit 0.9.

## File-backed metastore limitations

Expand Down
6 changes: 3 additions & 3 deletions docs/distributed-tracing/otel-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,8 +144,8 @@ search_settings:

## Known limitations

There are a few limitations on the current distributed tracing setup in Quickwit 0.7:
- The OTLP gRPC service does not provide High-Availability and High-Durability, This will be fixed in 0.8.
- OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature.
There are a few limitations on the current distributed tracing setup in Quickwit 0.9:
- The OTLP gRPC service does not provide High-Durability. This will be fixed in 0.10.
- OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature.

If you are interested in new features or discovered other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit).
4 changes: 2 additions & 2 deletions docs/log-management/otel-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,8 @@ You can also send traces to Quickwit that you can visualize in Jaeger UI, as exp

## Known limitations

There are a few limitations on the log management setup in Quickwit 0.7:
- The ingest API does not provide High-Availability and High-Durability, this will be fixed in 0.8.
There are a few limitations on the log management setup in Quickwit 0.9:
- The ingest API does not provide High-Durability. This will be fixed in 0.10.
- OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature.

If you are interested in new features or discover other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit).
Loading