Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus metrics endpoint #511

Merged
merged 15 commits into from
Nov 16, 2022

Conversation

thinkharderdev
Copy link
Contributor

@thinkharderdev thinkharderdev commented Nov 10, 2022

Which issue does this PR close?

Closes #504
Closes #507

Depends on #504

Posting this for review now, the functionality is done but I also want to address #507 here. If I can't get to that before this is approved it's fine to merge and I can add the user guide in a separate PR.

Rationale for this change

Track the size of the pending task queue in the scheduler and expose through both prometheus metrics and the external scaler service.

The number of pending tasks is the number of tasks that can be scheduled but for which there is no executor slot to schedule it on (i.e. it does not count tasks for unresolved stages, etc).

What changes are included in this PR?

Add a pending_tasks state to the QueryStageScheduler and track the value in response to scheduler events.

I also added some more bells and whistles to the SchedulerTest utility introduced in the previous PR. We can now submit a job and control when "virtual" executors send their task updates so we have finer-grained control to test different scenarios.

Added a rudimentary user guide.

Are there any user-facing changes?

The external scaler service should now return the actual number of pending tasks instead of a hard-coded value. Also, the pending task queue size should be available in the prometheus metrics.

No

@thinkharderdev thinkharderdev marked this pull request as ready for review November 10, 2022 23:33
@avantgardnerio
Copy link
Contributor

🦾 🥳

@thinkharderdev
Copy link
Contributor Author

Added #514 for making this more configurable

@andygrove andygrove changed the title Add pending task metric Add Prometheus metrics endpoint Nov 16, 2022
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @thinkharderdev

I tested this and it works well.

# HELP job_cancelled_total Counter of cancelled jobs
# TYPE job_cancelled_total counter
job_cancelled_total 0
# HELP job_completed_total Counter of completed jobs
# TYPE job_completed_total counter
job_completed_total 3
# HELP job_exec_time_seconds Histogram of successful job execution time in seconds
# TYPE job_exec_time_seconds histogram
job_exec_time_seconds_bucket{le="0.5"} 0
job_exec_time_seconds_bucket{le="1"} 0
job_exec_time_seconds_bucket{le="5"} 0
job_exec_time_seconds_bucket{le="30"} 3
job_exec_time_seconds_bucket{le="60"} 3
job_exec_time_seconds_bucket{le="+Inf"} 3
job_exec_time_seconds_sum 28.743
job_exec_time_seconds_count 3

@andygrove andygrove merged commit 0c22e52 into apache:master Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add user guide section on prometheus metrics
3 participants