Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collector: add tasks API collection #778

Merged
merged 6 commits into from
Mar 14, 2024

Conversation

devoxel
Copy link
Contributor

@devoxel devoxel commented Sep 20, 2023

This commit adds simple aggregation of Elasticsearch Tasks API.
There are 4 new metrics; though 3 are just bookkeeping.

elasticsearch_task_stats_action_total is a gague reporting the total
number of tasks running for a given action. Because there are no stats
endpoints available for this, this change introduces an aggregation step
to group the number of tasks by action name.

This metric is useful for ensuring long running actions of a specific
kind stay within a specific limit. Of particular use to me is
the action: 'indices:data/write/delete/byquery'.

In my usecase, our ES access patterns mean we have a predefined limit
of these actions running on the cluster.

This change also adds two new CLI flags to manage the collection of tasks API:

--es.tasks (to enable task collection)
--es.tasks.actions (to filter tasks by action param)

Issue #525 proposed addition of collection of these tasks.

This commit adds simple aggregation of Elasticsearch Tasks API.
There are 4 new metrics; though 3 are just bookkeeping.

elasticsearch_task_stats_action_total is a gague reporting the total
number of tasks running for a given action. Because there are no stats
endpoints available for this, this change introduces an aggregation step
to group the number of tasks by action name.

This metric is useful for ensuring long running actions of a specific
kind stay within a specific limit. Of particular use to me is
the action: 'indices:data/write/delete/byquery'.

In my usecase, our ES access patterns mean we have a predefined limit
of these actions running on the cluster.

This change also adds two new CLI flags to manage the collection of tasks API:

	--es.tasks (to enable task collection)
	--es.tasks.actions (to filter tasks by action param)

Issue prometheus-community#525 proposed addition of collection of these tasks.

Signed-off-by: Aaron Delaney <[email protected]>
@devoxel
Copy link
Contributor Author

devoxel commented Sep 22, 2023

@sysadmind hey - would it be possible to review this?

Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving towards the Collector interface for collectors. If you look at cluster_info and cluster_settings, you will see examples of the newer style. If that was the only change, I would say we could fix it up after merging, but with the other review remarks, I think it's worthwhile to do at the same time. Using the Collector interface means --es.tasks is no longer necessary. It will be --collector.tasks automatically. The --es.tasks.actions flag will still be necessary, but I don't think that the "es." prefix is necessary.

Overall, I think this is close and the logic seems to be correct.

collector/tasks.go Outdated Show resolved Hide resolved
collector/tasks.go Outdated Show resolved Hide resolved
collector/tasks.go Outdated Show resolved Hide resolved
collector/tasks_response.go Outdated Show resolved Hide resolved
Action string `json:"action"`
}

type AggregatedTaskStats struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that there is a benefit to having this type. We could just pass the map instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the struct makes the code a bit more self documenting; it makes it clear we're doing aggregation and by what. If you'd prefer I can make that change though

collector/tasks_test.go Outdated Show resolved Hide resolved
collector/tasks_test.go Outdated Show resolved Hide resolved
@devoxel
Copy link
Contributor Author

devoxel commented Sep 28, 2023

Thanks for the prompt & useful review @sysadmind :)

The --es.tasks.actions flag will still be necessary

I ended up moving the flag into tasks.go directly and using a global; because the collector factory cannot pass params during construction.

The downside of directly gathering flags in the init() are:

  • A package level global mutable variable is never nice; some other function could accidentally change the behaviour of tasks

  • If the package is imported it's flags are infectious. However, registerCollector does this anyway; so I don't think this issue is worth discussion.

I considered some alternatives:

  • Extending Collector interface: this ends up with us either exposing flag implementation details or only supporting string value options

  • A struct tag solution with reflect.TypeOf: this seems nice on paper but implementation is way too complex for such a small gain

If you have a better implementation please suggest.

collector/tasks.go Outdated Show resolved Hide resolved
collector/tasks.go Outdated Show resolved Hide resolved
@devoxel
Copy link
Contributor Author

devoxel commented Feb 9, 2024

Hey - @sysadmind - any update here? I have addressed your most recent set of comments

collector/tasks.go Outdated Show resolved Hide resolved
Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good except the metric name. Thanks for your patience.

devoxel added 2 commits March 4, 2024 10:44
Signed-off-by: Aaron Delaney <[email protected]>
Signed-off-by: Aaron Delaney <[email protected]>
@devoxel
Copy link
Contributor Author

devoxel commented Mar 4, 2024

Thanks for your patience.

It's understandable - maintaining OSS is a thankless & endless task and I appreciate the reviews.

Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sysadmind sysadmind merged commit 1d5d44b into prometheus-community:master Mar 14, 2024
4 checks passed
jaimeyh pushed a commit to sysdiglabs/elasticsearch_exporter that referenced this pull request Jun 14, 2024
* collector: add tasks API collection

This commit adds simple aggregation of Elasticsearch Tasks API.
There are 4 new metrics; though 3 are just bookkeeping.

elasticsearch_task_stats_action_total is a gague reporting the total
number of tasks running for a given action. Because there are no stats
endpoints available for this, this change introduces an aggregation step
to group the number of tasks by action name.

This metric is useful for ensuring long running actions of a specific
kind stay within a specific limit. Of particular use to me is
the action: 'indices:data/write/delete/byquery'.

In my usecase, our ES access patterns mean we have a predefined limit
of these actions running on the cluster.

This change also adds two new CLI flags to manage the collection of tasks API:

	--es.tasks (to enable task collection)
	--es.tasks.actions (to filter tasks by action param)

Issue prometheus-community#525 proposed addition of collection of these tasks.

Signed-off-by: Aaron Delaney <[email protected]>

* collector: use collector interface for tasks

Signed-off-by: Aaron Delaney <[email protected]>

* all: fix issues reported by golangci-lint

Signed-off-by: Aaron Delaney <[email protected]>

* collector: make task structs private to package

Signed-off-by: Aaron Delaney <[email protected]>

* Fix task stats metric name

Signed-off-by: Aaron Delaney <[email protected]>

* Fix tasks test

Signed-off-by: Aaron Delaney <[email protected]>

---------

Signed-off-by: Aaron Delaney <[email protected]>
jaimeyh added a commit to sysdiglabs/elasticsearch_exporter that referenced this pull request Jun 14, 2024
jaimeyh added a commit to sysdiglabs/elasticsearch_exporter that referenced this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants