Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] add deployment_stats to trained model stats #80531

Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 126 additions & 1 deletion docs/reference/ml/df-analytics/apis/get-trained-models-stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Retrieves usage information for trained models.
[[ml-get-trained-models-stats-prereq]]
== {api-prereq-title}

Requires the `monitor_ml` cluster privilege. This privilege is included in the
Requires the `monitor_ml` cluster privilege. This privilege is included in the
`machine_learning_user` built-in role.


Expand Down Expand Up @@ -86,6 +86,131 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
(integer)
The number of ingest pipelines that currently refer to the model.

`deployment_stats`:::
(list)
A collection of deployment stats if one of the provided `model_id` values
is deployed
+
.Properties of deployment stats
[%collapsible%open]
=====
`model_id`:::
(string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]

`model_size`:::
(<<byte-units,byte value>>)
The size of the loaded model in bytes.

`start_time`:::
(long)
The epoch timestamp when the deployment started.

`state`:::
(string)
The overall state of the deployment. The values may be:
+
--
* `starting`: The deployment has recently started but is not yet usable as the model is not allocated on any nodes.
* `started`: The deployment is usable as at least one node has the model allocated.
* `stopping`: The deployment is preparing to stop and un-allocate the model from the relevant nodes.
--

`allocation_status`:::
(object)
The detailed allocation status given the deployment configuration.
+
.Properties of allocation stats
[%collapsible%open]
======
`allocation_count`:::
(integer)
The current number of nodes where the model is allocated.

`target_allocation_count`:::
(integer)
The desired number of nodes for model allocation.

`state`:::
(string)
The detailed allocation state related to the nodes.
+
--
* `starting`: Allocations are being attempted but no node currently has the model allocated.
* `started`: At least one node has the model allocated.
* `fully_allocated`: The deployment is fully allocated and satisfies the `target_allocation_count`.
--
======

`nodes`:::
(array of objects)
The deployment stats for each node that currently has the model allocated.
+
.Properties of node stats
[%collapsible%open]
======
`average_inference_time_ms`:::
(double)
The average time for each inference call to complete on this node.

`inference_count`:::
(integer)
The total number of inference calls made against this node for this model.

`last_access`:::
(long)
The epoch time stamp of the last inference call for the model on this node.

`node`:::
(object)
Information pertaining to the node.
+
.Properties of node
[%collapsible%open]
========
`attributes`:::
(object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-attributes]

`ephemeral_id`:::
(string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-ephemeral-id]

`id`:::
(string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-id]

`name`:::
(string) The node name.

`transport_address`:::
(string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-transport-address]
========

`routing_state`:::
(object)
The current routing state and reason for the current routing state for this allocation.
+
--
* `starting`: The model is attempting to allocate on this model, inference calls are not yet accepted.
* `started`: The model is allocated and ready to accept inference requests.
* `stopping`: The model is being de-allocated from this node.
* `stopped`: The model is fully de-allocated from this node.
* `failed`: The allocation attempt failed, see `reason` field for the potential cause.
--

`reason`:::
(string)
The reason for the current state. Usually only populated when the `routing_state` is `failed`.

`start_time`:::
(long)
The epoch timestamp when the allocation started.

======
=====

`inference_stats`:::
(object)
A collection of inference stats fields.
Expand Down
1 change: 0 additions & 1 deletion docs/reference/ml/df-analytics/apis/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ include::explain-dfanalytics.asciidoc[leveloffset=+2]
include::get-dfanalytics.asciidoc[leveloffset=+2]
include::get-dfanalytics-stats.asciidoc[leveloffset=+2]
include::get-trained-models.asciidoc[leveloffset=+2]
include::get-trained-model-deployment-stats.asciidoc[leveloffset=+2]
include::get-trained-models-stats.asciidoc[leveloffset=+2]
//INFER
include::infer-trained-model-deployment.asciidoc[leveloffset=+2]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ You can use the following APIs to perform {infer} operations:
* <<delete-trained-models-aliases>>
* <<get-trained-models>>
* <<get-trained-models-stats>>
* <<get-trained-model-deployment-stats>>

You can deploy a trained model to make predictions in an ingest pipeline or in
an aggregation. Refer to the following documentation to learn more:
Expand Down

This file was deleted.

Loading