-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix panic when max_span_count is reached, add counter metric (#104)
* Fix panic when max_span_count is reached, add counter metric Panic seen in `ghcr.io/jaegertracing/jaeger-clickhouse:0.8.0` with `log-level=debug`: ``` panic: undefined type *clickhousespanstore.WriteWorker return from workerHeap goroutine 20 [running]: github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*WriteWorkerPool).CleanWorkers(0xc00020c300, 0xc00008eefc) github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/pool.go:95 +0x199 github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*WriteWorkerPool).Work(0xc00020c300) github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/pool.go:50 +0x15e created by github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*SpanWriter).backgroundWriter github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/writer.go:89 +0x226 ``` Also adds metric counter and logging to surface when things are hitting backpressure. Signed-off-by: Nick Parker <[email protected]> * Potential fix for deadlock: Avoid holding mutex while waiting on close Signed-off-by: Nick Parker <[email protected]> * Discard new batches instead of waiting for old batches to finish The current limit logic can result in a stall where `worker.CLose()` never returns due to errors being returned from ClickHouse. This switches to a simpler system of discarding new work when the limit is reached, ensuring that we don't get backed up indefinitely in the event of a long outage. Also moves the count of pending spans to the parent pool: - Avoids race conditions where new work can be started before it's added to the count - Mutexing around the count is no longer needed Signed-off-by: Nick Parker <[email protected]> * Include arbitrary worker_id in logs to differentiate between retry loops Signed-off-by: Nick Parker <[email protected]> * Fix lint Signed-off-by: Nick Parker <[email protected]>
- Loading branch information
Showing
5 changed files
with
122 additions
and
87 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,35 +1,39 @@ | ||
address: tcp://some-clickhouse-server:9000 | ||
# When empty the embedded scripts from sqlscripts directory are used | ||
init_sql_scripts_dir: | ||
# Maximal amount of spans that can be written at the same time. Default 10_000_000 | ||
max_span_count: | ||
# Batch write size. Default 10_000. | ||
batch_write_size: | ||
# Batch flush interval. Default 5s. | ||
batch_flush_interval: | ||
# Encoding of stored data. Either json or protobuf. Default json. | ||
encoding: | ||
# Path to CA TLS certificate. | ||
ca_file: | ||
# Username for connection. Default is "default". | ||
username: | ||
# Password for connection. | ||
password: | ||
# Database name. The database has to be created manually before Jaeger starts. Default is "default". | ||
database: | ||
# Endpoint for scraping prometheus metrics. Default localhost:9090. | ||
metrics_endpoint: localhost:9090 | ||
# Whether to use sql scripts supporting replication and sharding. | ||
# Replication can be used only on database with Atomic engine. | ||
# Default false. | ||
replication: | ||
# Table with spans. Default "jaeger_spans_local" or "jaeger_spans" when replication is enabled. | ||
spans_table: | ||
# Span index table. Default "jaeger_index_local" or "jaeger_index" when replication is enabled. | ||
spans_index_table: | ||
# Operations table. Default "jaeger_operations_local" or "jaeger_operations" when replication is enabled. | ||
operations_table: | ||
# TTL for data in tables in days. If 0, no TTL is set. Default 0. | ||
ttl: | ||
# The maximum number of spans to fetch per trace. If 0, no limits is set. Default 0. | ||
max_num_spans: | ||
address: tcp://some-clickhouse-server:9000 | ||
# When empty the embedded scripts from sqlscripts directory are used | ||
init_sql_scripts_dir: | ||
# Maximal amount of spans that can be pending writes at a time. | ||
# New spans exceeding this limit will be discarded, | ||
# keeping memory in check if there are issues writing to ClickHouse. | ||
# Check the "jaeger_clickhouse_discarded_spans" metric to keep track of discards. | ||
# If 0, no limit is set. Default 10_000_000. | ||
max_span_count: | ||
# Batch write size. Default 10_000. | ||
batch_write_size: | ||
# Batch flush interval. Default 5s. | ||
batch_flush_interval: | ||
# Encoding of stored data. Either json or protobuf. Default json. | ||
encoding: | ||
# Path to CA TLS certificate. | ||
ca_file: | ||
# Username for connection to ClickHouse. Default is "default". | ||
username: | ||
# Password for connection to ClickHouse. | ||
password: | ||
# ClickHouse database name. The database must be created manually before Jaeger starts. Default is "default". | ||
database: | ||
# Endpoint for serving prometheus metrics. Default localhost:9090. | ||
metrics_endpoint: localhost:9090 | ||
# Whether to use sql scripts supporting replication and sharding. | ||
# Replication can be used only on database with Atomic engine. | ||
# Default false. | ||
replication: | ||
# Table with spans. Default "jaeger_spans_local" or "jaeger_spans" when replication is enabled. | ||
spans_table: | ||
# Span index table. Default "jaeger_index_local" or "jaeger_index" when replication is enabled. | ||
spans_index_table: | ||
# Operations table. Default "jaeger_operations_local" or "jaeger_operations" when replication is enabled. | ||
operations_table: | ||
# TTL for data in tables in days. If 0, no TTL is set. Default 0. | ||
ttl: | ||
# The maximum number of spans to fetch per trace. If 0, no limit is set. Default 0. | ||
max_num_spans: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters