[ML][Inference] adding ensemble model objects #47241

benwtrent · 2019-09-27T18:07:21Z

This adds the ensemble model object to core and HLRC.

Changes include:

Updating Tree model so that it can support regression and classification target types
Adding probability calculation for Tree and Ensemble
Adding Ensemble model (and to the HLRC)
Adding OutputAggregator models (WeightedSum, WeightedMode).
Adding classification_labels

elasticmachine · 2019-09-27T18:07:23Z

Pinging @elastic/ml-core

...test/java/org/elasticsearch/client/ml/inference/trainedmodel/ensemble/WeightedModeTests.java

...c/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ensemble/WeightedSum.java

...ugin/core/src/test/java/org/elasticsearch/xpack/core/ml/inference/utils/StatisticsTests.java

...ck/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/utils/Statistics.java

...ugin/core/src/test/java/org/elasticsearch/xpack/core/ml/inference/utils/StatisticsTests.java

...gin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/tree/Tree.java

...h-level/src/test/java/org/elasticsearch/client/ml/inference/trainedmodel/tree/TreeTests.java

valeriy42

Looks good

droberts195 · 2019-10-01T11:09:29Z

...evel/src/main/java/org/elasticsearch/client/ml/inference/trainedmodel/ensemble/Ensemble.java

+        List<TrainedModel> trainedModels;
+        OutputAggregator outputAggregator;
+        TargetType targetType;
+        List<String> classificationLabels;


These variables are not private. It's not a big problem at the moment as the setters don't do any critical processing. But in the future if a setter did anything extra then there would be a way to bypass it. So unless there's a really good reason not to I'd make these private.

They should totally be private. Text editing error.

droberts195 · 2019-10-01T11:12:53Z

.../src/main/java/org/elasticsearch/client/ml/inference/trainedmodel/ensemble/WeightedMode.java

+        true,
+        a -> new WeightedMode((List<Double>)a[0]));
+    static {
+        PARSER.declareDoubleArray(ConstructingObjectParser.constructorArg(), WEIGHTS);


The rest of the class assumes weights can be null. If it was then it couldn't be round-tripped through XContent and back, as this parser requires weights. I think it should be consistent: either enforce weights != null throughout or make the parser tolerate weights not being present.

Its tricky for client side, we should probably be lenient, I will make it optional.

droberts195 · 2019-10-01T11:17:17Z

.../src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ensemble/Ensemble.java

+        OutputAggregator outputAggregator;
+        TargetType targetType = TargetType.REGRESSION;
+        List<String> classificationLabels;
+        boolean modelsAreOrdered;


Can these variables be private?

droberts195 · 2019-10-01T11:18:03Z

.../src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ensemble/Ensemble.java

+        }
+
+        private void setOutputAggregatorFromParser(List<OutputAggregator> outputAggregators) {
+            if ((outputAggregators.size() == 1) == false) {


outputAggregators.size() != 1?

przemekwitek · 2019-10-01T11:24:20Z

...gin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/tree/Tree.java

+        if (nodes.isEmpty()) {
+            return;
+        }
+        Set<Integer> visited = new HashSet<>();


Should this one also be initialized with nodes.size()?

przemekwitek · 2019-10-01T11:27:50Z

...gin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/tree/Tree.java

+        }
+    }
+
+    private void detectNullOrMissingNode() {


null is a correct value and you continue in line 274 when you encounter it.
To me this method name suggests that nulls will also be reported as missing. Let me know if I misunderstood it or please rename.

przemekwitek · 2019-10-01T11:31:47Z

...gin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/tree/Tree.java

+    }
+
+    private Double maxLeafValue() {
+        return targetType == TargetType.CLASSIFICATION ?


Have you considered introducing some class hierarchy ("RegressionTree", "ClassificationTree", etc.) to avoid explicit checks against targetType? Just leaving this as an idea. You can decide if it makes the code more readable.
I'm just afraid the more differences there will be between regression and classification, the more ifs of this kind we'll need.

I will look into it, this type of thing is a constant issue with OO style programming. Separating out the actions from the data.

...ck/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/utils/Statistics.java

przemekwitek · 2019-10-01T11:40:09Z

...test/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ensemble/EnsembleTests.java

+
+    public void testEnsembleWithInvalidModel() {
+        List<String> featureNames = Arrays.asList("foo", "bar");
+        expectThrows(ElasticsearchException.class, () -> {


Is there any meaningful error message to assert on?

Not particularly, ensemble could have ANY type of sub-model, I think we just want to make sure it is not considered valid.

przemekwitek · 2019-10-01T11:47:21Z

...ore/src/test/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/tree/TreeTests.java

+        // This feature vector should hit the right child of the root node
+        List<Double> featureVector = Arrays.asList(0.6, 0.0);
+        Map<String, Object> featureMap = zipObjMap(featureNames, featureVector);
+        assertEquals(Arrays.asList(0.0, 1.0), tree.classificationProbability(featureMap));


This could be written as:

assertThat(tree.classificationProbability(featureMap), contains(0.0, 1.0));

przemekwitek · 2019-10-01T11:56:45Z

.../src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ensemble/Ensemble.java

+        if (outputAggregator != null) {
+            return outputAggregator.aggregate(processedInferences);
+        }
+        return processedInferences.stream().mapToDouble(Double::doubleValue).sum();


This also looks like an aggregation. Should it be wrapped in outputAggregator (possibly in the constructor so that outputAggregator is always non-null here)?

@przemekwitek, I could add a default aggregator that is just a sum, the reason for a default is that even though the output aggregation is optional, we should still return something for inference, the default being a simple summation.

przemekwitek · 2019-10-01T11:58:01Z

.../src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ensemble/Ensemble.java

+
+    @Override
+    public List<Double> classificationProbability(Map<String, Object> fields) {
+        if ((targetType == TargetType.CLASSIFICATION) == false) {


Is this check needed provided that this method delegates to another one (with a check) in line 134?

I think so, I don't think we want to even parse the field input if we are classification.

przemekwitek · 2019-10-01T12:03:16Z

...c/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ensemble/WeightedSum.java

+    @Override
+    public double aggregate(List<Double> values) {
+        Objects.requireNonNull(values, "values must not be null");
+        Optional<Double> summation = values.stream().reduce((memo, v) -> memo + v);


I think you could provide Double::sum instead of (memo, v) -> memo + v.

przemekwitek · 2019-10-01T12:05:04Z

...c/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ensemble/WeightedSum.java

+        if (summation.isPresent()) {
+            return summation.get();
+        }
+        throw new IllegalArgumentException("values must not contain null values");


When values are empty, summation will be empty as well, right? Then this message can be misleading.

…com:benwtrent/elasticsearch into feature/ml-inference-ensemble-model-parsing

przemekwitek

LGTM

droberts195

LGTM

benwtrent · 2019-10-01T15:11:37Z

@elasticmachine update branch

benwtrent · 2019-10-01T16:20:40Z

run elasticsearch-ci/2

* [ML][Inference] adding ensemble model objects * addressing PR comments * Update TreeTests.java * addressing PR comments * fixing test

[ML][Inference] adding ensemble model objects

e821c16

benwtrent added >non-issue :ml Machine learning v8.0.0 v7.5.0 labels Sep 27, 2019

benwtrent mentioned this pull request Sep 28, 2019

[ML][Inference] adding .ml-inference* index and storage #47267

Merged

przemekwitek reviewed Sep 30, 2019

View reviewed changes

addressing PR comments

e6dd87d

valeriy42 reviewed Sep 30, 2019

View reviewed changes

benwtrent and others added 2 commits September 30, 2019 12:50

Merge branch 'master' into feature/ml-inference-ensemble-model-parsing

5fb5517

Update TreeTests.java

ecb310f

droberts195 reviewed Oct 1, 2019

View reviewed changes

przemekwitek reviewed Oct 1, 2019

View reviewed changes

benwtrent added 2 commits October 1, 2019 09:43

addressing PR comments

8bbeaf3

Merge branch 'feature/ml-inference-ensemble-model-parsing' of github.…

b916553

…com:benwtrent/elasticsearch into feature/ml-inference-ensemble-model-parsing

przemekwitek approved these changes Oct 1, 2019

View reviewed changes

droberts195 approved these changes Oct 1, 2019

View reviewed changes

elasticmachine and others added 2 commits October 1, 2019 08:11

Merge branch 'master' into feature/ml-inference-ensemble-model-parsing

7ec027d

fixing test

2925524

benwtrent merged commit af4e6ed into elastic:master Oct 1, 2019

benwtrent deleted the feature/ml-inference-ensemble-model-parsing branch October 1, 2019 18:18

benwtrent mentioned this pull request Oct 2, 2019

[7.x] [ML][Inference] adding ensemble model objects (#47241) #47438

Merged

benwtrent added a commit that referenced this pull request Oct 2, 2019

[ML][Inference] adding ensemble model objects (#47241) (#47438)

2228a7d

* [ML][Inference] adding ensemble model objects * addressing PR comments * Update TreeTests.java * addressing PR comments * fixing test

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML][Inference] adding ensemble model objects #47241

[ML][Inference] adding ensemble model objects #47241

benwtrent commented Sep 27, 2019

elasticmachine commented Sep 27, 2019

valeriy42 left a comment

droberts195 Oct 1, 2019

benwtrent Oct 1, 2019

droberts195 Oct 1, 2019

benwtrent Oct 1, 2019

droberts195 Oct 1, 2019

droberts195 Oct 1, 2019

przemekwitek Oct 1, 2019

przemekwitek Oct 1, 2019

przemekwitek Oct 1, 2019

benwtrent Oct 1, 2019

przemekwitek Oct 1, 2019

benwtrent Oct 1, 2019

przemekwitek Oct 1, 2019

przemekwitek Oct 1, 2019

benwtrent Oct 1, 2019

przemekwitek Oct 1, 2019

benwtrent Oct 1, 2019

przemekwitek Oct 1, 2019

przemekwitek Oct 1, 2019

przemekwitek left a comment

droberts195 left a comment

benwtrent commented Oct 1, 2019

benwtrent commented Oct 1, 2019

[ML][Inference] adding ensemble model objects #47241

[ML][Inference] adding ensemble model objects #47241

Conversation

benwtrent commented Sep 27, 2019

elasticmachine commented Sep 27, 2019

valeriy42 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

przemekwitek left a comment

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

benwtrent commented Oct 1, 2019

benwtrent commented Oct 1, 2019