-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add new include flag to GET inference/<model_id> API for model training metadata #61922
[ML] Add new include flag to GET inference/<model_id> API for model training metadata #61922
Conversation
ece8df8
to
eacc213
Compare
082aff6
to
19b0eb6
Compare
19b0eb6
to
bfb390e
Compare
docs/java-rest/high-level/ml/get-trained-models-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/java-rest/high-level/ml/get-trained-models-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/java-rest/high-level/ml/get-trained-models-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/java-rest/high-level/ml/get-trained-models-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/java-rest/high-level/ml/get-trained-models-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc
Outdated
Show resolved
Hide resolved
docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc
Outdated
Show resolved
Hide resolved
Co-authored-by: Lisa Cawley <[email protected]>
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added three more suggestions, otherwise LGTM
|
||
`from`:: | ||
(Optional, integer) | ||
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't notice this sooner, but these descriptions have the same problem--they're job-specific. I've created model-specific descriptions in #62128
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from] | |
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from-models] |
|
||
`size`:: | ||
(Optional, integer) | ||
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto re needing a model-specific description:
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size] | |
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size-models] |
} | ||
}, | ||
{ | ||
"feature_name" : "FlightTimeMin", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very long example, so it's worth considering putting "..." in a couple of places where the info is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs LGTM
18f2a16
to
4e4a198
Compare
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has changed a lot since my last review, so I've added more suggestions.
<4> Indicate if the total feature importance for the features used in training | ||
should be included in the model `metadata` field. | ||
<5> Should the definition be fully decompressed on GET | ||
<6> Allow empty response if no Trained Models match the provided ID patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<6> Allow empty response if no Trained Models match the provided ID patterns. | |
<6> Allow empty response if no trained models match the provided ID patterns. |
If false, an error will be thrown if no Trained Models match the | ||
ID patterns. | ||
<6> An optional list of tags used to narrow the model search. A Trained Model | ||
<7> An optional list of tags used to narrow the model search. A Trained Model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<7> An optional list of tags used to narrow the model search. A Trained Model | |
<7> An optional list of tags used to narrow the model search. A trained model |
===== | ||
`total_feature_importance`::: | ||
(array) | ||
An array of the total feature importance for each training feature used from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An array of the total feature importance for each training feature used from
AFAIK we've never defined a "training feature". Can we just say feature? If not, this term needs to be explained.
@@ -785,6 +785,23 @@ prediction. Defaults to the `results_field` value of the {dfanalytics-job} that | |||
used to train the model, which defaults to `<dependent_variable>_prediction`. | |||
end::inference-config-results-field-processor[] | |||
|
|||
tag::inference-metadata-feature-importance-feature-name[] | |||
The training feature name for which this importance was calculated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
training feature name
Ditto earlier comment about "training feature".
The training feature name for which this importance was calculated. | |
The feature for which this importance was calculated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this in #62643 too
(Optional, string) | ||
A comma delimited string of optional fields to include in the response body. | ||
Valid options are: | ||
- definition: to include the model definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- definition: to include the model definition | |
- `definition`: Includes the model definition |
A comma delimited string of optional fields to include in the response body. | ||
Valid options are: | ||
- definition: to include the model definition | ||
- total_feature_importance: to include the total feature importance for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- total_feature_importance: to include the total feature importance for the | |
- `total_feature_importance`: Includes the total feature importance for the |
Valid options are: | ||
- definition: to include the model definition | ||
- total_feature_importance: to include the total feature importance for the | ||
training feature sets. This field will be available in the `metadata` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
training feature sets. This field will be available in the `metadata` field. | |
training data sets. This field is available in the `metadata` field in the response body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
can have many tags or none. The trained models in the response will | ||
contain all the provided tags. | ||
<7> Optional boolean value indicating if certain fields should be removed on | ||
<8> Optional boolean value indicating if certain fields should be removed on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap these 2 sentences around, the fact that certain fields are removed is the detail
<8> Optional boolean value for requesting the trained model in a format that can then be put into another cluster. Certain fields that can only be set when the model is imported are removed.
@@ -408,7 +412,7 @@ public Builder(TrainedModelConfig config) { | |||
this.definition = config.definition == null ? null : new LazyModelDefinition(config.definition); | |||
this.description = config.getDescription(); | |||
this.tags = config.getTags(); | |||
this.metadata = config.getMetadata(); | |||
this.metadata = config.getMetadata() == null ? null : new HashMap<>(config.getMetadata()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package private ctor also wraps this map in line 151
this.metadata = metadata == null ? null : Collections.unmodifiableMap(metadata);
Make the ctor argument a UnmodifiableMap
and getMetadata()
return an UnmodifiableMap
and I think that simplifies these maps being wrapped in a map in a map etc
} | ||
if (this.metadata == null) { | ||
this.metadata = new HashMap<>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a copy of the unmodifiable map here
@@ -482,11 +518,11 @@ public void getTrainedModel(final String modelId, final boolean includeDefinitio | |||
try { | |||
builder = handleSearchItem(multiSearchResponse.getResponses()[0], modelId, this::parseInferenceDocLenientlyFromSource); | |||
} catch (ResourceNotFoundException ex) { | |||
listener.onFailure(new ResourceNotFoundException( | |||
getTrainedModelListener.onFailure(new ResourceNotFoundException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It is clearer if finalListener
is used for these onFailure calls.
if (includeTotalFeatureImportance == false) { | ||
finalListener.onResponse(modelBuilders.stream() | ||
.map(TrainedModelConfig.Builder::build) | ||
.sorted(Comparator.comparing(TrainedModelConfig::getModelId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configs are still sorted by the search aren't they? Sorting here shouldn't be necessary
.sorted(Comparator.comparing(TrainedModelConfig::getModelId)) | ||
.collect(Collectors.toList())); | ||
return; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: delete extra blank line
.collect(Collectors.toList())), | ||
failure -> { | ||
// total feature importance is not necessary for a model to be valid | ||
// we shouldn't fail if it is not found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 makes sense
Is it the case that newer models after version 7.10 will always have the total feature importance?
@@ -142,6 +142,8 @@ yamlRestTest { | |||
'ml/inference_crud/Test put ensemble with tree where tree has out of bounds feature_names index', | |||
'ml/inference_crud/Test put model with empty input.field_names', | |||
'ml/inference_crud/Test PUT model where target type and inference config mismatch', | |||
'ml/inference_metadata/Test get given missing trained model metadata', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a inference_metadata.yml
did you forget to add it?
run elasticsearch-ci/2 |
created by {dfanalytics} contain `analysis_config` and `input` objects. | ||
.Properties of metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't formatting properly, so I think a continuation character is required.
.Properties of metadata | |
+ | |
.Properties of metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this in #62643
…raining metadata (elastic#61922) Adds new flag include to the get trained models API The flag initially has two valid values: definition, total_feature_importance. Consequently, the old include_model_definition flag is now deprecated. When total_feature_importance is included, the total_feature_importance field is included in the model metadata object. Including definition is the same as previously setting include_model_definition=true.
…odel training metadata (#61922) (#62620) * [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) Adds new flag include to the get trained models API The flag initially has two valid values: definition, total_feature_importance. Consequently, the old include_model_definition flag is now deprecated. When total_feature_importance is included, the total_feature_importance field is included in the model metadata object. Including definition is the same as previously setting include_model_definition=true. * fixing test * Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
Adds new flag
include
to the get trained models APIThe flag initially has two valid values: definition, total_feature_importance.
Consequently, the old
include_model_definition
flag is now deprecated.When
total_feature_importance
is included, thetotal_feature_importance
fieldis included in the model
metadata
object.Including
definition
is the same as previously settinginclude_model_definition=true
.