[ML] adds new feature_processors field for data frame analytics #60528

benwtrent · 2020-07-31T15:52:55Z

feature_processors allow users to create custom features from
individual document fields.

These feature_processors are the same object as the trained model's pre_processors.

They are passed to the native process and the native process then appends them to the
pre_processor array in the inference model.

closes #59327

...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java

benwtrent · 2020-07-31T15:55:55Z

Need to write tests still, but the overall design is hammered out.

...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java

...n/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/process/AnalyticsProcessManager.java

feature_processors allow users to create custom features from individual document fields.

elasticmachine · 2020-08-04T14:17:39Z

Pinging @elastic/ml-core (:ml)

dimitris-athanasiou

Looks good! Just need to work through a few minor comments and some simplifications if possible.

...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java

...core/src/main/java/org/elasticsearch/xpack/core/ml/inference/preprocessing/PreProcessor.java

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/extractor/ExtractedFields.java

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/extractor/ProcessedField.java

.../ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/DataFrameDataExtractor.java

...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java

…d-processors

dimitris-athanasiou · 2020-08-12T04:02:39Z

Almost there! I think the last bit missing is covering the changes in ExtractedFieldsDetector with unit tests in ExtractedFieldsDetectorTests.

benwtrent · 2020-08-12T14:52:23Z

@elasticmachine update branch

…elasticsearch into feature/ml-dfa-add-processors

dimitris-athanasiou · 2020-08-12T04:01:05Z

...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java

+        Set<String> duplicatedFields = new HashSet<>();
+        for (ProcessedField processedField : processedFields) {
+            for (String output : processedField.getOutputFieldNames()) {
+                if(processedFeatures.add(output) == false) {


nit: space after if

dimitris-athanasiou · 2020-08-13T11:15:23Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/extractor/ProcessedField.java

@@ -52,7 +52,7 @@ public ProcessedField(PreProcessor processor) {
            }
        }
        preProcessor.process(inputs);
-        return preProcessor.outputFields().stream().map(inputs::get).toArray();
+        return preProcessor.outputFields().stream().map(inputs::get).filter(Objects::nonNull).toArray();


Does this work correctly? If we filter out null objects, won't we mess the correspondence of the values to the output fields?

Let me think on this more.

We don't want to return partial lists, for sure. But we also don't want to put empty/missing unless the caller supports missing values...

dimitris-athanasiou · 2020-08-13T11:18:07Z

...rc/test/java/org/elasticsearch/xpack/ml/dataframe/extractor/DataFrameDataExtractorTests.java

@@ -472,12 +479,100 @@ public void testGetCategoricalFields() {
            containsInAnyOrder("field_keyword", "field_text", "field_boolean"));
    }

+    public void testWithProcessedFeatures_FieldInfo() {


rename to testGetFieldNames_GivenProcessesFeatures ?

dimitris-athanasiou · 2020-08-13T11:18:51Z

...rc/test/java/org/elasticsearch/xpack/ml/dataframe/extractor/DataFrameDataExtractorTests.java

@@ -551,4 +646,70 @@ protected SearchResponse executeSearchScrollRequest(String scrollId) {
            return searchResponse;
        }
    }
+
+    static class CategoricalPreProcessor implements PreProcessor {


shall we make this private?

dimitris-athanasiou · 2020-08-13T11:21:42Z

...c/test/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetectorTests.java

@@ -943,6 +949,196 @@ public void testDetect_GivenAnalyzedFieldExcludesObjectField() {
        assertThat(e.getMessage(), equalTo("analyzed_fields must not include or exclude object fields: [object_field]"));
    }

+    public void testDetect_givenFeatureProcessorsFailures() {


I think there is a lot of value on keeping the unit tests targeting a very specific piece of functionality when possible. The reason for that is that when a test fails, it is really helpful it if makes it clear what the problem was. I would suggest splitting this test into individual tests with names that indicate the validation that is tested. It also makes the tests serve as live documentation.

I realise this is a subjective preference. If you are not convinced by the argument, you can of course leave it as is :-)

😭

This PR is going to end up being near 2k lines.

…d-processors

dimitris-athanasiou

LGTM

benwtrent · 2020-08-14T11:32:53Z

run elasticsearch-ci/packaging-sample-windows

…tic#60528) feature_processors allow users to create custom features from individual document fields. These `feature_processors` are the same object as the trained model's pre_processors. They are passed to the native process and the native process then appends them to the pre_processor array in the inference model. closes elastic#59327

…) (#61148) feature_processors allow users to create custom features from individual document fields. These `feature_processors` are the same object as the trained model's pre_processors. They are passed to the native process and the native process then appends them to the pre_processor array in the inference model. closes #59327

benwtrent added >enhancement :ml Machine learning v8.0.0 v7.10.0 labels Jul 31, 2020

benwtrent commented Jul 31, 2020

View reviewed changes

...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java Outdated Show resolved Hide resolved

benwtrent force-pushed the feature/ml-dfa-add-processors branch 3 times, most recently from bb3d527 to 53730cd Compare August 4, 2020 13:46

benwtrent commented Aug 4, 2020

View reviewed changes

...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java Show resolved Hide resolved

benwtrent commented Aug 4, 2020

View reviewed changes

...n/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/process/AnalyticsProcessManager.java Show resolved Hide resolved

[ML] adds new feature_processors field for data frame analytics

497c7a8

feature_processors allow users to create custom features from individual document fields.

benwtrent force-pushed the feature/ml-dfa-add-processors branch from 53730cd to 497c7a8 Compare August 4, 2020 14:11

benwtrent marked this pull request as ready for review August 4, 2020 14:17

benwtrent requested a review from dimitris-athanasiou August 4, 2020 14:17

muting bwc test

517cd85

dimitris-athanasiou reviewed Aug 11, 2020

View reviewed changes

benwtrent added 6 commits August 11, 2020 07:34

Merge remote-tracking branch 'upstream/master' into feature/ml-dfa-ad…

1e2ef32

…d-processors

addressing PR comments

52405e7

fixing test

4cae33a

adding field processor test

b8a6df9

fixing test

be5c290

ensuring output feature uniqueness

d419dfc

adding more tests

21c6999

benwtrent requested a review from dimitris-athanasiou August 12, 2020 14:52

elasticmachine and others added 2 commits August 12, 2020 08:52

Merge branch 'master' into feature/ml-dfa-add-processors

553d221

fixing precommit

a8d130f

Merge branch 'feature/ml-dfa-add-processors' of github.com:benwtrent/…

eac75c1

…elasticsearch into feature/ml-dfa-add-processors

dimitris-athanasiou reviewed Aug 13, 2020

View reviewed changes

benwtrent added 3 commits August 13, 2020 13:56

Merge remote-tracking branch 'upstream/master' into feature/ml-dfa-ad…

b923b15

…d-processors

adjusting output, addressing pr comments

7ad1520

fixing formatting

f8951d0

dimitris-athanasiou approved these changes Aug 14, 2020

View reviewed changes

benwtrent merged commit de3107a into elastic:master Aug 14, 2020

benwtrent deleted the feature/ml-dfa-add-processors branch August 14, 2020 12:01

benwtrent mentioned this pull request Aug 14, 2020

[7.x] [ML] adds new feature_processors field for data frame analytics (#60528) #61148

Merged

Mpdreamz mentioned this pull request Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this pull request Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] adds new feature_processors field for data frame analytics #60528

[ML] adds new feature_processors field for data frame analytics #60528

benwtrent commented Jul 31, 2020

benwtrent commented Jul 31, 2020

elasticmachine commented Aug 4, 2020

dimitris-athanasiou left a comment

dimitris-athanasiou commented Aug 12, 2020

benwtrent commented Aug 12, 2020

dimitris-athanasiou Aug 12, 2020

dimitris-athanasiou Aug 13, 2020

benwtrent Aug 13, 2020

dimitris-athanasiou Aug 13, 2020

dimitris-athanasiou Aug 13, 2020

dimitris-athanasiou Aug 13, 2020

benwtrent Aug 13, 2020

dimitris-athanasiou left a comment

benwtrent commented Aug 14, 2020

[ML] adds new feature_processors field for data frame analytics #60528

[ML] adds new feature_processors field for data frame analytics #60528

Conversation

benwtrent commented Jul 31, 2020

benwtrent commented Jul 31, 2020

elasticmachine commented Aug 4, 2020

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

dimitris-athanasiou commented Aug 12, 2020

benwtrent commented Aug 12, 2020

dimitris-athanasiou Aug 12, 2020

Choose a reason for hiding this comment

dimitris-athanasiou Aug 13, 2020

Choose a reason for hiding this comment

benwtrent Aug 13, 2020

Choose a reason for hiding this comment

dimitris-athanasiou Aug 13, 2020

Choose a reason for hiding this comment

dimitris-athanasiou Aug 13, 2020

Choose a reason for hiding this comment

dimitris-athanasiou Aug 13, 2020

Choose a reason for hiding this comment

benwtrent Aug 13, 2020

Choose a reason for hiding this comment

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

benwtrent commented Aug 14, 2020