Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Model Type Validation to Validate API ("non-blocker") #384

Merged
merged 13 commits into from
Mar 9, 2022

Conversation

amitgalitz
Copy link
Member

@amitgalitz amitgalitz commented Feb 28, 2022

Description

This PR adds two new changes.

  1. Adds model (“non-blocker”) type validation to the current validation API. The model type validation is non-blocking in nature meaning that any warning that arise are simply suggestions or added information to the user about which configuration could be changed in order to increase the likelihood of model training completing successfully. These validation checks could also potentially recommend a specific detector interval or window delay value that the user can try to use.
    • Every-time model type validation is called, it initially also calls the blocker validation since there its not possible to run non-blocker checks if there is a blocking issue initially.
  2. Adds a check that the given timefield exists in the index mapping and also that it is of type date to AbstractAnomalyDetectorActionHandler which means this check is now executed for model

Non-Blocker validation steps:

  1. Get latest time stamp (if no latest time stamp found then user is told there is not enough historical data)
  2. Check if detector is of type multi-entity and if it is then only look at the top entity for any proceeding validation
  3. Check if general density of data is good enough for current configurations to complete model training with all the configurations applied.
    • This check will be run multiple times with different interval configurations until an interval is either recommended, same as the given one or no interval can be found
  4. If no interval can be found with all configurations applied that will lead to data that is dense enough then proceed to run general density check by sequentially adding configurations one at a time
    • For example, first only the raw data is looked at in terms of how dense the data is → if raw data on its own is dense enough then next the data filter (filter_query) will be added and general density will be checked for again
  5. If no other issues exist in the data then check if any new window delay recommendation is possible (latest time stamp is used)

Additional Info (Important):

  • All validation for general density involves a bucket aggregation that finds if enough buckets in the last x intervals have at least 1 document present.
  • I will be adding more new tests in separate commit or PR
  • Response wording haven't finished tech writer review yet

API Request:

  • Path: _plugins/_anomaly_detection/detectors/_validate/<aspect>
  • Parameter(aspect)
    • model
  • Payload: Detector Configs

Examples

Example 1 (data ingested only every 5 mins, recommends 5 minutes interval of 1 minute) :

{
   "name": "test-hc-detector",
   "description": "Test detector",
   "time_field": "@timestamp",
   "indices": [
       "host-cloudwatch-two"
   ],
   "feature_attributes": [
       {
           "feature_name": "test",
           "feature_enabled": true,
           "aggregation_query": {
               "test": {
                   "avg": {
                       "field": "cpu"
                   }
               }
           }
       }
   ],
   "filter_query": {
       "bool": {
           "filter": [
               {
                   "exists": {
                       "field": "host"
                   }
               }
           ],
           "adjust_pure_negative": true,
           "boost": 1
       }
   },
   "detection_interval": {
       "period": {
           "interval": 1,
           "unit": "Minutes"
       }
   },
   "window_delay": {
       "period": {
           "interval": 1,
           "unit": "Minutes"
       }
   },
   "category_field": [
       "host"
   ]
}

Response 1:

{
    "model": {
        "detection_interval": {
            "message": "We suggest using a detector interval of: 5",
            "suggested_value": {
                "period": {
                    "interval": 5,
                    "unit": "Minutes"
                }
            }
        }
    }
}

Example 2 request (filter query leads to not enough dense data)

  • CPU values in this data set range between 5-15 minutes
{
    "name": "test-hc-detector",
  ...
   "filter_query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "cpu": {
                            "gt": 10000
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    }
    "detection_interval": {
        "period": {
            "interval": 5,
            "unit": "Minutes"
        }
    },
    ...
}

Response 2:

{
    "model": {
        "filter_query": {
            "message": "Data is too sparse after data filter is applied."
        }
    }
}

Issues Resolved

resolves #265

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ohltyler
Copy link
Member

Have the responses been reviewed by the tech writer? The 2 examples you have seem a little confusing to me:

"message": "We suggest using a detector interval of: 5",

Who is "we"? Can this be written as more of a direct suggestion, like "Suggested interval: 5 minutes" or something?

"message": "Data is too sparse after data filter is applied."

This seems unclear on next steps for the user. Can there be some added text about removing or revising the data filter?

@amitgalitz
Copy link
Member Author

Have the responses been reviewed by the tech writer? The 2 examples you have seem a little confusing to me:

"message": "We suggest using a detector interval of: 5",

Who is "we"? Can this be written as more of a direct suggestion, like "Suggested interval: 5 minutes" or something?

"message": "Data is too sparse after data filter is applied."

This seems unclear on next steps for the user. Can there be some added text about removing or revising the data filter?

These responses have yet to be finalized by tech writers, will add that to top description. Currently planned on adding a little more text on the frontend such as "We suggest going back and changing the data filter" (this wording will need to be improved). Do you suggest I add this sort of direction to the API response itself

@ohltyler
Copy link
Member

ohltyler commented Feb 28, 2022

Have the responses been reviewed by the tech writer? The 2 examples you have seem a little confusing to me:

"message": "We suggest using a detector interval of: 5",

Who is "we"? Can this be written as more of a direct suggestion, like "Suggested interval: 5 minutes" or something?

"message": "Data is too sparse after data filter is applied."

This seems unclear on next steps for the user. Can there be some added text about removing or revising the data filter?

These responses have yet to be finalized by tech writers, will add that to top description. Currently planned on adding a little more text on the frontend such as "We suggest going back and changing the data filter" (this wording will need to be improved). Do you suggest I add this sort of direction to the API response itself

I see, I'd discuss with UX on that. To me it seems that you could add all of the text as part of the API and just propagate it to the frontend for simplicity. Don't need to add UI-specific context like "go back to this page" but more like "change the data filter"

And again, the 'We' wording I believe should be removed

.subAggregation(
PipelineAggregatorBuilders
.bucketSort("bucketSort", Collections.singletonList(new FieldSortBuilder("_count").order(SortOrder.DESC)))
.size(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's understandable to only get the top entity combination to use when doing the sparsity checks. But just thinking, what if the data has one or two dominant/frequent entity combos, and many more that are less common, but still useful? For example, maybe an interval of 5 mins is above the threshold for the top 1 entity combo, but an interval of 10 mins is above the threshold for the top 20 entity combos, in which case a suggestion of 10 mins interval would be much more useful from the customer perspective. Is it easy enough to repeat this for the top x entity combos and try to derive a more general working interval suggestion? (this applies to single-category HCAD as well, just using multi-category as the example).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could implement this within the first check when all configurations are applied. It would mean running multiple intervals for each one of the top 10 entities and keeping track of the highest interval. The best case is when I find an interval for all top entities but I guess if there is only an interval found for 5 of them, maybe it means the category fields should be relooked at or it could be good enough? On the check where I am adding the category field to the configurations I could repeat this as well but first goal could be only to initially do it for the first run with all configurations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's impossible to know for sure what would be the ideal number of running entities. Roughly, my idea would be

  1. get top x entities (if < y then just optimize for single entity)
  2. find a baseline interval that works for single top entity (what you're already doing)
  3. check if that works for top z entities - if not, increase it a few more times to see if it can go over the threshold for top z entities - if not, then stick with baseline found in step 2

where x = num top entities constant, y = num top entities threshold, z = num top entities working given the interval threshold

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Id consider this not a hard reqt for this release, just an idea.

@amitgalitz amitgalitz marked this pull request as ready for review March 7, 2022 22:42
@amitgalitz amitgalitz requested a review from a team March 7, 2022 22:42
@amitgalitz amitgalitz added enhancement New feature or request v1.3.0 labels Mar 7, 2022
@codecov-commenter
Copy link

codecov-commenter commented Mar 7, 2022

Codecov Report

Merging #384 (098bb93) into main (30d0636) will decrease coverage by 1.61%.
The diff coverage is 16.48%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #384      +/-   ##
============================================
- Coverage     79.25%   77.64%   -1.62%     
- Complexity     4095     4105      +10     
============================================
  Files           295      296       +1     
  Lines         17207    17640     +433     
  Branches       1826     1876      +50     
============================================
+ Hits          13638    13697      +59     
- Misses         2672     3038     +366     
- Partials        897      905       +8     
Flag Coverage Δ
plugin 77.64% <16.48%> (-1.62%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...in/java/org/opensearch/ad/constant/CommonName.java 75.00% <ø> (ø)
.../java/org/opensearch/ad/model/AnomalyDetector.java 91.92% <ø> (ø)
...opensearch/ad/model/IntervalTimeConfiguration.java 88.46% <0.00%> (-11.54%) ⬇️
...est/handler/IndexAnomalyDetectorActionHandler.java 100.00% <ø> (ø)
.../ad/rest/handler/ModelValidationActionHandler.java 0.00% <0.00%> (ø)
.../handler/ValidateAnomalyDetectorActionHandler.java 100.00% <ø> (+8.33%) ⬆️
...pensearch/ad/settings/AnomalyDetectorSettings.java 100.00% <ø> (ø)
...c/main/java/org/opensearch/ad/util/ParseUtils.java 69.89% <0.00%> (-5.99%) ⬇️
...g/opensearch/ad/model/DetectorValidationIssue.java 68.42% <25.00%> (+3.50%) ⬆️
...rch/ad/common/exception/ADValidationException.java 72.00% <44.44%> (-22.12%) ⬇️
... and 13 more

Comment on lines 710 to 714
// This case has been reached if no interval recommendation was found that leads to a bucket success rate of >= 0.75
// but no single configuration during the following checks reduced the bucket success rate below 0.25
// This means the rate with all configs applied was below 0.75 but the rate when checking each configuration at time
// was always above 0.25 meaning the best suggestion is to simply ingest more data since we have no more insight
// regarding the root cause of the lower density.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment need to be tuned after recent changes to the thresholds?

Comment on lines 1457 to 1458
AnomalyDetector detector = TestHelpers.randomAnomalyDetector(TIME_FIELD, "index-test", ImmutableList.of(nonNumericFeature));
TestHelpers.createIndexWithTimeField(client(), "index-test", TIME_FIELD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see several here that could use the helper method too, or at least a similar helper method

@@ -435,4 +423,47 @@ public void testValidateAnomalyDetectorWithDetectorNameTooLong() throws IOExcept
assertEquals(ValidationAspect.DETECTOR, response.getIssue().getAspect());
assertTrue(response.getIssue().getMessage().contains("Name should be shortened. The maximum limit is"));
}

@Test
public void testValidateAnomalyDetectorWithNonExistentTimefield() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you working on any tests to run the validation API with sparse data to test out all of the different scenarios? I think it's ok to have as a TODO, but maybe open an issue to track and add for following release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am currently continuing to do it in terms of integration tests like what I added here: src/test/java/org/opensearch/ad/e2e/DetectionResultEvalutationIT.java. Will cut an issue and work on adding more in separate PR.

ohltyler
ohltyler previously approved these changes Mar 8, 2022
Copy link
Member

@ohltyler ohltyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few small things in recent comments.

@@ -692,26 +697,33 @@ private void checkFeatureQuery(long latestTime) throws IOException {
client.search(searchRequest, ActionListener.wrap(response -> processFeatureQuery(response, latestTime), listener::onFailure));
}

private void sendWindowDelayRec(long latestTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: latestTimeInMillis may be more informative

}

private void sendWindowDelayRec(long latestTime) {
long minutesSinceLastStamp = TimeUnit.MILLISECONDS.toMinutes(Instant.now().toEpochMilli() - latestTime);
Copy link
Member

@ohltyler ohltyler Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does rounding work in this case? For example, if the difference is 1400 ms -> 1.4 minutes, will it return 1 or 2? If regular rounding / if it returns 1, then the suggestion may return the same window delay they already have.

Example: user sets window delay of 1 (= 1000 ms) -> currentTime - lastDataTime = 1400 > 1000, convert 1400 to minutes -> 1 minute, send recommendation of 1 minute

Copy link
Member

@ohltyler ohltyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@amitgalitz amitgalitz merged commit 875b03c into opensearch-project:main Mar 9, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 9, 2022
ohltyler pushed a commit that referenced this pull request Mar 9, 2022
@ohltyler ohltyler added feature new feature and removed enhancement New feature or request labels Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Anomaly Detection: Adding Validation API "Non-Blocker" Checks
4 participants