-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use query in cardinality check #49939
Use query in cardinality check #49939
Conversation
Pinging @elastic/ml-core (:ml) |
run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Nice one!
protected static DataFrameAnalyticsConfig buildAnalytics(String id, String sourceIndex, String destIndex, | ||
@Nullable String resultsField, DataFrameAnalysis analysis, | ||
QueryBuilder queryBuilder) throws Exception { | ||
return new DataFrameAnalyticsConfig.Builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other method called buildAnalytics
(starting in line 165) should call this one, as the new method is more generic.
@@ -248,6 +250,27 @@ public void testDependentVariableCardinalityTooHighError() { | |||
assertThat(e.getMessage(), equalTo("Field [keyword-field] must have at most [2] distinct values but there were at least [3]")); | |||
} | |||
|
|||
public void testDependentVariableCardinalityTooHighErrorButWithQuery() throws Exception { | |||
initialize("cardinality_too_high"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the names assigned to jobs unique (currently this one and the one in line 236 are the same).
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When checking the cardinality of a field, the query should be take into account. The user might know about some bad data in their index and want to filter down to the target_field values they care about.
When checking the cardinality of a field, the query should be take into account. The user might know about some bad data in their index and want to filter down to the target_field values they care about.