Failure Store: Update go-docappender to respect failure store status #228

1pkg · 2025-01-29T22:13:49Z

Description

This PR updates go-docappender to respect new failure store response status and emit new correspodning metrics accordingly. The old "indexed" metrics stay intact instead a new separate set of failure store labels is exposed.

This PR depends on changes made for BulkIndexerResponseItem in elastic/go-elasticsearch#948 and should only be merged afterwards.

How to test:

Use an instance of ES that has failure store feature enabled and enable failure store for a data stream via component template with.

POST _component_template/$name
{
  "template": {
    "data_stream_options": {
      "failure_store": {
        "enabled": true
      }
    }
  }
}

Set a custom "fail" ingest pipeline with.

PUT _ingest/pipeline/$name
{
  "processors": [
    {
      "fail": {
        "message": "fail"
      }
    }
  ]
}

Ingest some data to corresponding data stream, then check that failure store metrics are getting reported correctly.

To emulate "failed" status, set backing data stream index to be read only with.

PUT /$index/_settings
{
  "index.blocks.read_only": true
}

appender_test.go

axw

Looks good! Just one question

appender_test.go

axw · 2025-01-31T03:51:46Z

One more question, sorry: should we be measuring "not_enabled" too?

1pkg · 2025-01-31T19:45:07Z

One more question, sorry: should we be measuring "not_enabled" too?

Agree, this metric could be useful too. Updated to expose it in the last commit.

go.mod

appender.go

…ocappender into response_failure_store_support

marclop

I think this looks good overall, just one small question

marclop · 2025-02-14T01:52:49Z

bulk_indexer.go

+	// FailureStore contains failure store specific stats.
+	FailureStore struct {
+		// Used contains the total number of documents indexed to failure store.
+		Used int64
+		// Failed contains the total number of documents which failed when indexed to failure store.
+		Failed int64
+		// NotEnabled contains the total number of documents which could have been indexed to failure store
+		// if it was enabled.
+		NotEnabled int64
+	}


Is it worth adding the FailureStore struct in the legacy stats metrics?

Correct me if I'm wrong, but per my understanding legacy stats metrics are residing in https://github.com/elastic/go-docappender/blob/main/appender.go#L65. Which I already reverted in 48f9d85.

While this is just generic bulk response container which is used to pass data to appender for reporting OTEL metrics, etc.

simitt · 2025-02-17T08:18:09Z

appender.go

+		a.addCount(failureStore.Used, nil,
+			a.metrics.docsIndexed,
+			metric.WithAttributes(
+				attribute.String("status", "FailureStore"),


I'm not sure how the status: "FailureStore" is supposed to be used?
E.g. when calculating SLOs and being interested in success vs failed documents. If I understand it correct then the failure_store: used documents would be counted as success whereas failure_store: failed and failure_store: not_enabled would be counted as failed?

My idea behind it that the existing SLOs stay untouched, with enabled failure store they should always be 100% good events in theory. Separately a new set of SLOs can be created to track failure store error rate, and the number of documents indexed to FS vs all documents.

sounds good, thanks

bulk_indexer.go

simitt · 2025-02-17T08:28:55Z

bulk_indexer.go

+		Failed int64
+		// NotEnabled contains the total number of documents which could have been indexed to failure store
+		// if it was enabled.
+		NotEnabled int64


The already reported stats are not very consistent (some of them contain the term docs, others don't - e.g. Indexed vs RetriedDocs, FailedDocs).
However, when reading FailureStore.Used, FailureStore.Failed and FailureStore.NotEnabled, it's not necessarily clear from the naming whether this would count the number of documents or batch requests. Is it too verbose to add Docs?

I see no problem adding docs to the names. What makes more sense FailureStoreDocs.Used, FailureStoreDocs.Failred, FailureStoreDocs.NotEnabled or alternatively FailureStore.UsedDocs, FailureStore.FailredDocs, FailureStore.NotEnabledDocs?

I'm slightly in favour of FailureStoreDocs.Used, but no strong opinion.

1pkg added 4 commits January 24, 2025 17:57

failure store: update bulk indexer to support fs stats

d3506a0

Merge branch 'main' into response_failure_store_support

2cdab68

test: fix racy test

94581be

failure store: appender stats update

38095d1

1pkg self-assigned this Jan 29, 2025

1pkg added the enhancement New feature or request label Jan 29, 2025

elastic-observability-automation bot added the safe-to-test Automated label for running bench-diff on forked PRs label Jan 29, 2025

1pkg commented Jan 29, 2025

View reviewed changes

appender_test.go Show resolved Hide resolved

1pkg added 5 commits January 29, 2025 14:31

failure store: update stats too

bc81099

failure store: add to response filter

6e8684c

failure store: ignore implicit not_applicable_or_unknown status

8b691cc

failure store: add relevant tests

2f4f9f3

failure store: add status const

f2332ac

1pkg marked this pull request as ready for review January 31, 2025 01:17

1pkg requested a review from a team as a code owner January 31, 2025 01:17

1pkg mentioned this pull request Jan 31, 2025

esutil: add failure_store key to bulk response item elastic/go-elasticsearch#948

Merged

axw reviewed Jan 31, 2025

View reviewed changes

appender_test.go Outdated Show resolved Hide resolved

1pkg added 2 commits January 30, 2025 17:41

failure store: fix typo

323b9ba

failure store: fix metric label test

ee7973c

1pkg added 3 commits January 31, 2025 11:16

failure store: use main branch from go-elasticsearch

442690a

failure store: track "not_enabled" status

18c7530

failure store: fix formatting

0250223

1pkg commented Jan 31, 2025

View reviewed changes

go.mod Outdated Show resolved Hide resolved

1pkg requested review from axw and a team January 31, 2025 19:46

1pkg added 2 commits February 12, 2025 10:08

failure store: use latest elasticsearch client release

9423aa3

Merge branch 'main' into response_failure_store_support

c0d492c

1pkg requested a review from marclop February 12, 2025 18:12

Merge branch 'main' into response_failure_store_support

1f7e409

kruskall requested changes Feb 12, 2025

View reviewed changes

appender.go Outdated Show resolved Hide resolved

1pkg added 3 commits February 13, 2025 14:18

Merge branch 'main' into response_failure_store_support

22f8a12

failure store: do not export legacy metrics

48f9d85

Merge branch 'response_failure_store_support' of github.com:1pkg/go-d…

19d6d05

…ocappender into response_failure_store_support

1pkg requested a review from kruskall February 13, 2025 22:34

marclop reviewed Feb 14, 2025

View reviewed changes

Merge branch 'main' into response_failure_store_support

d2fbedf

1pkg requested a review from marclop February 14, 2025 03:10

1pkg enabled auto-merge (squash) February 14, 2025 03:10

simitt reviewed Feb 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure Store: Update go-docappender to respect failure store status #228

Failure Store: Update go-docappender to respect failure store status #228

1pkg commented Jan 29, 2025 •

edited

Loading

axw left a comment

axw commented Jan 31, 2025

1pkg commented Jan 31, 2025

marclop left a comment

marclop Feb 14, 2025

1pkg Feb 14, 2025

simitt Feb 17, 2025

1pkg Feb 19, 2025

simitt Feb 19, 2025

simitt Feb 17, 2025

1pkg Feb 19, 2025

simitt Feb 19, 2025

Failure Store: Update go-docappender to respect failure store status #228

Are you sure you want to change the base?

Failure Store: Update go-docappender to respect failure store status #228

Conversation

1pkg commented Jan 29, 2025 • edited Loading

Description

How to test:

axw left a comment

Choose a reason for hiding this comment

axw commented Jan 31, 2025

1pkg commented Jan 31, 2025

marclop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1pkg commented Jan 29, 2025 •

edited

Loading