Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AD model performance benchmark (#729) #734

Merged
merged 1 commit into from
Dec 1, 2022

Conversation

kaituo
Copy link
Collaborator

@kaituo kaituo commented Nov 23, 2022

Description

This PR adds an AD model performance benchmark so that we can compare model performance across versions.

Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions.

We also backported #600 so that we can capture the performance data in CI output.

Testing done:

  • added unit tests to run the benchmark.

Signed-off-by: Kaituo Li [email protected]

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This PR adds an AD model performance benchmark so that we can compare model performance across versions.

Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions.

We also backported opensearch-project#600 so that we can capture the performance data in CI output.

Testing done:
* added unit tests to run the benchmark.

Signed-off-by: Kaituo Li <[email protected]>
@kaituo kaituo requested review from a team, amitgalitz and ohltyler November 23, 2022 19:57
@kaituo
Copy link
Collaborator Author

kaituo commented Nov 23, 2022

Build failed due to

> Can't get https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.5.0/latest/linux/x64/tar/builds/opensearch/plugins/opensearch-job-scheduler-2.5.0.0.zip to D:\a\anomaly-detection\anomaly-detection\src\test\resources\job-scheduler\opensearch-job-scheduler-2.5.0.0.zip
See https://docs.gradle.org/7.4.2/userguide/command_line_interface.html#sec:command_line_warnings

Tested locally using

(22-11-23 11:54:46) <0> [~/code/github/opensearch-ad]
dev-dsk-kaituo-2b-bf84c4db % ./gradlew build -Dopensearch.version=2.4.0-SNAPSHOT

...
r implicit dependency. This can lead to incorrect results being produced, depending on what order the tasks are executed. Please refer to https://docs.gradle.org/7.4.2/userguide/validation_problems.html#implicit_dependency for more details about this problem.

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

See https://docs.gradle.org/7.4.2/userguide/command_line_interface.html#sec:command_line_warnings

Execution optimizations have been disabled for 1 invalid unit(s) of work during this build to ensure correctness.
Please consult deprecation warnings for more details.

BUILD SUCCESSFUL in 15m 6s
27 actionable tasks: 27 executed

@kaituo kaituo merged commit 867c1b3 into opensearch-project:2.x Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants