-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure Store: Update go-docappender to respect failure store status #228
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just one question
One more question, sorry: should we be measuring "not_enabled" too? |
Agree, this metric could be useful too. Updated to expose it in the last commit. |
…ocappender into response_failure_store_support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good overall, just one small question
// FailureStore contains failure store specific stats. | ||
FailureStore struct { | ||
// Used contains the total number of documents indexed to failure store. | ||
Used int64 | ||
// Failed contains the total number of documents which failed when indexed to failure store. | ||
Failed int64 | ||
// NotEnabled contains the total number of documents which could have been indexed to failure store | ||
// if it was enabled. | ||
NotEnabled int64 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding the FailureStore
struct in the legacy stats metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but per my understanding legacy stats metrics are residing in https://github.com/elastic/go-docappender/blob/main/appender.go#L65. Which I already reverted in 48f9d85.
While this is just generic bulk response container which is used to pass data to appender for reporting OTEL metrics, etc.
a.addCount(failureStore.Used, nil, | ||
a.metrics.docsIndexed, | ||
metric.WithAttributes( | ||
attribute.String("status", "FailureStore"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how the status: "FailureStore"
is supposed to be used?
E.g. when calculating SLOs and being interested in success vs failed documents. If I understand it correct then the failure_store: used
documents would be counted as success
whereas failure_store: failed
and failure_store: not_enabled
would be counted as failed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea behind it that the existing SLOs stay untouched, with enabled failure store they should always be 100% good events in theory. Separately a new set of SLOs can be created to track failure store error rate, and the number of documents indexed to FS vs all documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, thanks
Failed int64 | ||
// NotEnabled contains the total number of documents which could have been indexed to failure store | ||
// if it was enabled. | ||
NotEnabled int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The already reported stats are not very consistent (some of them contain the term docs
, others don't - e.g. Indexed
vs RetriedDocs, FailedDocs
).
However, when reading FailureStore.Used
, FailureStore.Failed
and FailureStore.NotEnabled
, it's not necessarily clear from the naming whether this would count the number of documents or batch requests. Is it too verbose to add Docs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no problem adding docs to the names. What makes more sense FailureStoreDocs.Used
, FailureStoreDocs.Failred
, FailureStoreDocs.NotEnabled
or alternatively FailureStore.UsedDocs
, FailureStore.FailredDocs
, FailureStore.NotEnabledDocs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly in favour of FailureStoreDocs.Used
, but no strong opinion.
Description
This PR updates go-docappender to respect new failure store response status and emit new correspodning metrics accordingly. The old "indexed" metrics stay intact instead a new separate set of failure store labels is exposed.
This PR depends on changes made for
BulkIndexerResponseItem
in elastic/go-elasticsearch#948 and should only be merged afterwards.How to test:
Use an instance of ES that has failure store feature enabled and enable failure store for a data stream via component template with.
Set a custom "fail" ingest pipeline with.
Ingest some data to corresponding data stream, then check that failure store metrics are getting reported correctly.
To emulate "failed" status, set backing data stream index to be read only with.