-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] adds new feature_processors field for data frame analytics #60528
[ML] adds new feature_processors field for data frame analytics #60528
Conversation
...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java
Outdated
Show resolved
Hide resolved
Need to write tests still, but the overall design is hammered out. |
bb3d527
to
53730cd
Compare
...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java
Show resolved
Hide resolved
...n/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/process/AnalyticsProcessManager.java
Show resolved
Hide resolved
feature_processors allow users to create custom features from individual document fields.
53730cd
to
497c7a8
Compare
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just need to work through a few minor comments and some simplifications if possible.
...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/ml/inference/preprocessing/PreProcessor.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/extractor/ExtractedFields.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/extractor/ProcessedField.java
Outdated
Show resolved
Hide resolved
.../ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/DataFrameDataExtractor.java
Outdated
Show resolved
Hide resolved
.../ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/DataFrameDataExtractor.java
Outdated
Show resolved
Hide resolved
.../ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/DataFrameDataExtractor.java
Outdated
Show resolved
Hide resolved
...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java
Outdated
Show resolved
Hide resolved
...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java
Outdated
Show resolved
Hide resolved
...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java
Show resolved
Hide resolved
Almost there! I think the last bit missing is covering the changes in |
@elasticmachine update branch |
…elasticsearch into feature/ml-dfa-add-processors
Set<String> duplicatedFields = new HashSet<>(); | ||
for (ProcessedField processedField : processedFields) { | ||
for (String output : processedField.getOutputFieldNames()) { | ||
if(processedFeatures.add(output) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after if
@@ -52,7 +52,7 @@ public ProcessedField(PreProcessor processor) { | |||
} | |||
} | |||
preProcessor.process(inputs); | |||
return preProcessor.outputFields().stream().map(inputs::get).toArray(); | |||
return preProcessor.outputFields().stream().map(inputs::get).filter(Objects::nonNull).toArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work correctly? If we filter out null
objects, won't we mess the correspondence of the values to the output fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think on this more.
We don't want to return partial lists, for sure. But we also don't want to put empty/missing unless the caller supports missing values...
@@ -472,12 +479,100 @@ public void testGetCategoricalFields() { | |||
containsInAnyOrder("field_keyword", "field_text", "field_boolean")); | |||
} | |||
|
|||
public void testWithProcessedFeatures_FieldInfo() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to testGetFieldNames_GivenProcessesFeatures
?
@@ -551,4 +646,70 @@ protected SearchResponse executeSearchScrollRequest(String scrollId) { | |||
return searchResponse; | |||
} | |||
} | |||
|
|||
static class CategoricalPreProcessor implements PreProcessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we make this private
?
@@ -943,6 +949,196 @@ public void testDetect_GivenAnalyzedFieldExcludesObjectField() { | |||
assertThat(e.getMessage(), equalTo("analyzed_fields must not include or exclude object fields: [object_field]")); | |||
} | |||
|
|||
public void testDetect_givenFeatureProcessorsFailures() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a lot of value on keeping the unit tests targeting a very specific piece of functionality when possible. The reason for that is that when a test fails, it is really helpful it if makes it clear what the problem was. I would suggest splitting this test into individual tests with names that indicate the validation that is tested. It also makes the tests serve as live documentation.
I realise this is a subjective preference. If you are not convinced by the argument, you can of course leave it as is :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭
This PR is going to end up being near 2k lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run elasticsearch-ci/packaging-sample-windows |
…tic#60528) feature_processors allow users to create custom features from individual document fields. These `feature_processors` are the same object as the trained model's pre_processors. They are passed to the native process and the native process then appends them to the pre_processor array in the inference model. closes elastic#59327
…) (#61148) feature_processors allow users to create custom features from individual document fields. These `feature_processors` are the same object as the trained model's pre_processors. They are passed to the native process and the native process then appends them to the pre_processor array in the inference model. closes #59327
feature_processors allow users to create custom features from
individual document fields.
These
feature_processors
are the same object as the trained model's pre_processors.They are passed to the native process and the native process then appends them to the
pre_processor array in the inference model.
closes #59327