Identify model metrics that will be part of the PoC. #13

devguyio · 2023-08-11T10:38:27Z

Use Story

As a MLOps engineer I would like to have a collector of prometheus metrics at my edge node, so I can collect the operational metrics specific to my model. #6

A/C

Needed metrics for the PoC are documented.

grdryn · 2023-08-17T17:03:57Z

The UI apparently currently displays the following kinds of metrics, where possible:

Number of inference requests per unit time
Average inference response time
Two measures of model bias (SPD and DIR)

The MLFlow serving container that we're going to use for the PoC currently exposes enough metrics to cover the first two, but it seems like TrustyAI is the thing that would provide the bias metrics, and we don't currently plan to include that in the PoC, as far as I know.

So, from that MLFlow container image, the following are the metrics exposed (shown here without labels to make it more concise, but a more full output can be seen here in this gist):

# HELP parallel_request_queue counter of request queue size for workers
# TYPE parallel_request_queue histogram
parallel_request_queue_sum
parallel_request_queue_bucket
parallel_request_queue_count

# HELP rest_server_request_duration_seconds HTTP request duration, in seconds
# TYPE rest_server_request_duration_seconds histogram
rest_server_request_duration_seconds_sum
rest_server_request_duration_seconds_bucket
rest_server_request_duration_seconds_count

# HELP rest_server_requests_in_progress Total HTTP requests currently in progress
# TYPE rest_server_requests_in_progress gauge
rest_server_requests_in_progress

# HELP rest_server_requests_total Total HTTP requests
# TYPE rest_server_requests_total counter
rest_server_requests_total

Given that there are not that many, and the cardinality seems like it will should remain pretty stable (there are only 3 endpoints that I know of: /ping, /version, and /invocations), I suggest we just scrape and keep all of the metrics for now, unless we have a good reason not to.

In the gist I linked above (here again), I also included an example of the OpenVINO metrics for comparison.

BTW, to enable the REST port and enable the metrics to be served over it, something akin to the following run command is needed:

podman run --rm -ti -p 9000:9000 -p 9001:9001 ovms_face_detection:latest --metrics_enable --rest_port 9001

devguyio mentioned this issue Aug 11, 2023

As a MLOps engineer I would like to have a collector of prometheus metrics at my edge node, so I can collect the operational metrics specific to my model. #6

Closed

3 tasks

StevenTobin mentioned this issue Aug 11, 2023

As a MLOps engineer I would like to use ACM to send the metrics of my model back to my core data center/hub so I can visualize them and monitor in a centralized way. #7

Closed

3 tasks

devguyio mentioned this issue Aug 11, 2023

Define the Prometheus rules and monitors to collect the documented PoC metrics. #14

Closed

devguyio added this to AI Edge Tracking Aug 11, 2023

github-project-automation bot moved this to Backlog in AI Edge Tracking Aug 11, 2023

devguyio added kind/documentation Improvements or additions to documentation priority/high Important issue that needs to be resolved asap. Releases should not have too many o labels Aug 11, 2023

devguyio moved this from Backlog to Todo in AI Edge Tracking Aug 11, 2023

grdryn self-assigned this Aug 16, 2023

grdryn moved this from Todo to In Progress in AI Edge Tracking Aug 16, 2023

devguyio added this to the Edge PoC 1 milestone Aug 16, 2023

grdryn mentioned this issue Aug 25, 2023

Add OVMS metrics to UWM list #31

Merged

StevenTobin closed this as completed in #31 Aug 25, 2023

github-project-automation bot moved this from In Progress to Done in AI Edge Tracking Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify model metrics that will be part of the PoC. #13

Identify model metrics that will be part of the PoC. #13

devguyio commented Aug 11, 2023 •

edited

Loading

grdryn commented Aug 17, 2023

Identify model metrics that will be part of the PoC. #13

Identify model metrics that will be part of the PoC. #13

Comments

devguyio commented Aug 11, 2023 • edited Loading

Use Story

A/C

grdryn commented Aug 17, 2023

devguyio commented Aug 11, 2023 •

edited

Loading