[ML] Add new include flag to GET inference/<model_id> API for model training metadata #61922

benwtrent · 2020-09-03T14:38:36Z

Adds new flag include to the get trained models API

The flag initially has two valid values: definition, total_feature_importance.

Consequently, the old include_model_definition flag is now deprecated.

When total_feature_importance is included, the total_feature_importance field
is included in the model metadata object.

Including definition is the same as previously setting include_model_definition=true.

…tadata

docs/java-rest/high-level/ml/get-trained-models-metadata.asciidoc

docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc

Co-authored-by: Lisa Cawley <[email protected]>

benwtrent · 2020-09-08T15:39:38Z

@elasticmachine update branch

lcawl

Added three more suggestions, otherwise LGTM

lcawl · 2020-09-08T19:26:46Z

docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc

+
+`from`::
+(Optional, integer)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from]


Sorry I didn't notice this sooner, but these descriptions have the same problem--they're job-specific. I've created model-specific descriptions in #62128

Suggested change

include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from]

include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from-models]

lcawl · 2020-09-08T19:27:17Z

docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc

+
+`size`::
+(Optional, integer)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size]


Ditto re needing a model-specific description:

Suggested change

include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size]

include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size-models]

lcawl · 2020-09-08T19:29:37Z

docs/reference/ml/df-analytics/apis/get-inference-trained-model-metadata.asciidoc

+          }
+        },
+        {
+          "feature_name" : "FlightTimeMin",


This is a very long example, so it's worth considering putting "..." in a couple of places where the info is redundant.

lcawl

Docs LGTM

elasticmachine · 2020-09-16T17:32:17Z

Pinging @elastic/ml-core (:ml)

lcawl

This PR has changed a lot since my last review, so I've added more suggestions.

lcawl · 2020-09-16T19:07:44Z

docs/java-rest/high-level/ml/get-trained-models.asciidoc

+<4> Indicate if the total feature importance for the features used in training
+    should be included in the model `metadata` field.
+<5> Should the definition be fully decompressed on GET
+<6> Allow empty response if no Trained Models match the provided ID patterns.


Suggested change

<6> Allow empty response if no Trained Models match the provided ID patterns.

<6> Allow empty response if no trained models match the provided ID patterns.

lcawl · 2020-09-16T19:07:58Z

docs/java-rest/high-level/ml/get-trained-models.asciidoc

    If false, an error will be thrown if no Trained Models match the
    ID patterns.
-<6> An optional list of tags used to narrow the model search. A Trained Model
+<7> An optional list of tags used to narrow the model search. A Trained Model


Suggested change

<7> An optional list of tags used to narrow the model search. A Trained Model

<7> An optional list of tags used to narrow the model search. A trained model

lcawl · 2020-09-16T19:10:03Z

docs/reference/ml/df-analytics/apis/get-inference-trained-model.asciidoc

+=====
+`total_feature_importance`:::
+(array)
+An array of the total feature importance for each training feature used from


An array of the total feature importance for each training feature used from

AFAIK we've never defined a "training feature". Can we just say feature? If not, this term needs to be explained.

lcawl · 2020-09-16T19:11:26Z

docs/reference/ml/ml-shared.asciidoc

@@ -785,6 +785,23 @@ prediction. Defaults to the `results_field` value of the {dfanalytics-job} that
 used to train the model, which defaults to `<dependent_variable>_prediction`.
 end::inference-config-results-field-processor[]

+tag::inference-metadata-feature-importance-feature-name[]
+The training feature name for which this importance was calculated.


training feature name

Ditto earlier comment about "training feature".

Suggested change

The training feature name for which this importance was calculated.

The feature for which this importance was calculated.

I'll fix this in #62643 too

lcawl · 2020-09-16T19:14:29Z

docs/reference/ml/df-analytics/apis/get-inference-trained-model.asciidoc

+(Optional, string)
+A comma delimited string of optional fields to include in the response body.
+Valid options are:
+ - definition: to include the model definition


Suggested change

- definition: to include the model definition

- `definition`: Includes the model definition

lcawl · 2020-09-16T19:14:53Z

docs/reference/ml/df-analytics/apis/get-inference-trained-model.asciidoc

+A comma delimited string of optional fields to include in the response body.
+Valid options are:
+ - definition: to include the model definition
+ - total_feature_importance: to include the total feature importance for the


Suggested change

- total_feature_importance: to include the total feature importance for the

- `total_feature_importance`: Includes the total feature importance for the

lcawl · 2020-09-16T19:15:13Z

docs/reference/ml/df-analytics/apis/get-inference-trained-model.asciidoc

+Valid options are:
+ - definition: to include the model definition
+ - total_feature_importance: to include the total feature importance for the
+   training feature sets. This field will be available in the `metadata` field.


Suggested change

training feature sets. This field will be available in the `metadata` field.

training data sets. This field is available in the `metadata` field in the response body.

davidkyle

LGTM

davidkyle · 2020-09-17T08:47:06Z

docs/java-rest/high-level/ml/get-trained-models.asciidoc

    can have many tags or none. The trained models in the response will
    contain all the provided tags.
-<7> Optional boolean value indicating if certain fields should be removed on
+<8> Optional boolean value indicating if certain fields should be removed on


Swap these 2 sentences around, the fact that certain fields are removed is the detail

<8> Optional boolean value for requesting the trained model in a format that can then be put into another cluster. Certain fields that can only be set when the model is imported are removed.

davidkyle · 2020-09-17T09:00:46Z

.../plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/TrainedModelConfig.java

@@ -408,7 +412,7 @@ public Builder(TrainedModelConfig config) {
            this.definition = config.definition == null ? null : new LazyModelDefinition(config.definition);
            this.description = config.getDescription();
            this.tags = config.getTags();
-            this.metadata = config.getMetadata();
+            this.metadata = config.getMetadata() == null ? null : new HashMap<>(config.getMetadata());


The package private ctor also wraps this map in line 151

this.metadata = metadata == null ? null : Collections.unmodifiableMap(metadata);

Make the ctor argument a UnmodifiableMap and getMetadata() return an UnmodifiableMap and I think that simplifies these maps being wrapped in a map in a map etc

davidkyle · 2020-09-17T09:06:00Z

.../plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/TrainedModelConfig.java

+            }
+            if (this.metadata == null) {
+                this.metadata = new HashMap<>();
+            }


Make a copy of the unmodifiable map here

davidkyle · 2020-09-17T09:22:08Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

@@ -482,11 +518,11 @@ public void getTrainedModel(final String modelId, final boolean includeDefinitio
                try {
                    builder = handleSearchItem(multiSearchResponse.getResponses()[0], modelId, this::parseInferenceDocLenientlyFromSource);
                } catch (ResourceNotFoundException ex) {
-                    listener.onFailure(new ResourceNotFoundException(
+                    getTrainedModelListener.onFailure(new ResourceNotFoundException(


nit: It is clearer if finalListener is used for these onFailure calls.

davidkyle · 2020-09-17T09:27:08Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+                if (includeTotalFeatureImportance == false) {
+                    finalListener.onResponse(modelBuilders.stream()
+                        .map(TrainedModelConfig.Builder::build)
+                        .sorted(Comparator.comparing(TrainedModelConfig::getModelId))


Configs are still sorted by the search aren't they? Sorting here shouldn't be necessary

davidkyle · 2020-09-17T09:28:40Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+                                .sorted(Comparator.comparing(TrainedModelConfig::getModelId))
+                                .collect(Collectors.toList()));
+                            return;
+


nit: delete extra blank line

davidkyle · 2020-09-17T09:30:41Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

+                            .collect(Collectors.toList())),
+                    failure -> {
+                        // total feature importance is not necessary for a model to be valid
+                        // we shouldn't fail if it is not found


👍 makes sense

Is it the case that newer models after version 7.10 will always have the total feature importance?

davidkyle · 2020-09-17T09:36:15Z

x-pack/plugin/ml/qa/ml-with-security/build.gradle

@@ -142,6 +142,8 @@ yamlRestTest {
    'ml/inference_crud/Test put ensemble with tree where tree has out of bounds feature_names index',
    'ml/inference_crud/Test put model with empty input.field_names',
    'ml/inference_crud/Test PUT model where target type and inference config mismatch',
+    'ml/inference_metadata/Test get given missing trained model metadata',


I can't find a inference_metadata.yml did you forget to add it?

benwtrent · 2020-09-17T22:35:18Z

run elasticsearch-ci/2

lcawl · 2020-09-17T22:58:37Z

docs/reference/ml/df-analytics/apis/get-inference-trained-model.asciidoc

 created by {dfanalytics} contain `analysis_config` and `input` objects.
+.Properties of metadata


This isn't formatting properly, so I think a continuation character is required.

Suggested change

.Properties of metadata

+

.Properties of metadata

I'll fix this in #62643

…raining metadata (elastic#61922) Adds new flag include to the get trained models API The flag initially has two valid values: definition, total_feature_importance. Consequently, the old include_model_definition flag is now deprecated. When total_feature_importance is included, the total_feature_importance field is included in the model metadata object. Including definition is the same as previously setting include_model_definition=true.

…odel training metadata (#61922) (#62620) * [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) Adds new flag include to the get trained models API The flag initially has two valid values: definition, total_feature_importance. Consequently, the old include_model_definition flag is now deprecated. When total_feature_importance is included, the total_feature_importance field is included in the model metadata object. Including definition is the same as previously setting include_model_definition=true. * fixing test * Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java

benwtrent added >enhancement :ml Machine learning v8.0.0 v7.10.0 labels Sep 3, 2020

benwtrent force-pushed the feature/ml-inference-add-description-api branch from ece8df8 to eacc213 Compare September 3, 2020 16:22

benwtrent requested a review from lcawl September 4, 2020 12:52

benwtrent force-pushed the feature/ml-inference-add-description-api branch 2 times, most recently from 082aff6 to 19b0eb6 Compare September 4, 2020 13:50

[ML] Add new inference/<model_id>/_metadata API for model training me…

bfb390e

…tadata

benwtrent force-pushed the feature/ml-inference-add-description-api branch from 19b0eb6 to bfb390e Compare September 4, 2020 17:21

lcawl reviewed Sep 4, 2020

View reviewed changes

Apply suggestions from code review

90547d3

Co-authored-by: Lisa Cawley <[email protected]>

Merge branch 'master' into feature/ml-inference-add-description-api

8b962b5

benwtrent requested a review from lcawl September 8, 2020 15:39

lcawl reviewed Sep 8, 2020

View reviewed changes

addressing pr comments

4e9147b

lcawl approved these changes Sep 9, 2020

View reviewed changes

benwtrent added 2 commits September 9, 2020 14:15

Merge branch 'master' into feature/ml-inference-add-description-api

29a35a9

Merge branch 'master' into feature/ml-inference-add-description-api

4e4a198

benwtrent force-pushed the feature/ml-inference-add-description-api branch from 18f2a16 to 4e4a198 Compare September 16, 2020 14:12

benwtrent added 2 commits September 16, 2020 13:18

adjusting api changes to include feature importance via flag

02abb2b

fixing test

1812002

benwtrent marked this pull request as ready for review September 16, 2020 17:32

benwtrent changed the title ~~[ML] Add new inference/<model_id>/_metadata API for model training metadata~~ [ML] Add new include flag to GET inference/<model_id> API for model training metadata Sep 16, 2020

test fix

ed50912

lcawl reviewed Sep 16, 2020

View reviewed changes

davidkyle approved these changes Sep 17, 2020

View reviewed changes

addressing pr comments

7ec1813

Merge branch 'master' into feature/ml-inference-add-description-api

21bdf09

lcawl reviewed Sep 17, 2020

View reviewed changes

benwtrent merged commit fdb7b6d into elastic:master Sep 18, 2020

benwtrent mentioned this pull request Sep 18, 2020

[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) #62620

Merged

lcawl mentioned this pull request Sep 18, 2020

[DOCS] Formatting fix in get trained model API #62643

Merged

benwtrent deleted the feature/ml-inference-add-description-api branch October 19, 2020 19:34

Mpdreamz mentioned this pull request Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this pull request Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add new include flag to GET inference/<model_id> API for model training metadata #61922

[ML] Add new include flag to GET inference/<model_id> API for model training metadata #61922

benwtrent commented Sep 3, 2020 •

edited

Loading

benwtrent commented Sep 8, 2020

lcawl left a comment

lcawl Sep 8, 2020

lcawl Sep 8, 2020

lcawl Sep 8, 2020

lcawl left a comment

elasticmachine commented Sep 16, 2020

lcawl left a comment

lcawl Sep 16, 2020

lcawl Sep 16, 2020

lcawl Sep 16, 2020

lcawl Sep 16, 2020 •

edited

Loading

lcawl Sep 18, 2020

lcawl Sep 16, 2020

lcawl Sep 16, 2020

lcawl Sep 16, 2020

davidkyle left a comment

davidkyle Sep 17, 2020

davidkyle Sep 17, 2020

davidkyle Sep 17, 2020

davidkyle Sep 17, 2020

davidkyle Sep 17, 2020

davidkyle Sep 17, 2020

davidkyle Sep 17, 2020

davidkyle Sep 17, 2020

benwtrent commented Sep 17, 2020

lcawl Sep 17, 2020

lcawl Sep 18, 2020

	include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from]
	include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from-models]

	<6> Allow empty response if no Trained Models match the provided ID patterns.
	<6> Allow empty response if no trained models match the provided ID patterns.

	<7> An optional list of tags used to narrow the model search. A Trained Model
	<7> An optional list of tags used to narrow the model search. A trained model

	The training feature name for which this importance was calculated.
	The feature for which this importance was calculated.

	- definition: to include the model definition
	- `definition`: Includes the model definition

	- total_feature_importance: to include the total feature importance for the
	- `total_feature_importance`: Includes the total feature importance for the

	training feature sets. This field will be available in the `metadata` field.
	training data sets. This field is available in the `metadata` field in the response body.

		created by {dfanalytics} contain `analysis_config` and `input` objects.
		.Properties of metadata

[ML] Add new include flag to GET inference/<model_id> API for model training metadata #61922

[ML] Add new include flag to GET inference/<model_id> API for model training metadata #61922

Conversation

benwtrent commented Sep 3, 2020 • edited Loading

benwtrent commented Sep 8, 2020

lcawl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcawl left a comment

Choose a reason for hiding this comment

elasticmachine commented Sep 16, 2020

lcawl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcawl Sep 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent commented Sep 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent commented Sep 3, 2020 •

edited

Loading

lcawl Sep 16, 2020 •

edited

Loading