-
Notifications
You must be signed in to change notification settings - Fork 253
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DOCS] Split anomaly detection into multiple pages
- Loading branch information
Showing
11 changed files
with
285 additions
and
278 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
17 changes: 17 additions & 0 deletions
17
docs/en/stack/ml/anomaly-detection/ml-ad-algorithms.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
[[ml-ad-algorithms]] | ||
= {anomaly-detect-cap} algorithms | ||
:keywords: {ml-init}, {stack}, {anomaly-detect} | ||
|
||
The {anomaly-detect} {ml-features} use a bespoke amalgamation of different | ||
techniques such as clustering, various types of time series decomposition, | ||
Bayesian distribution modeling, and correlation analysis. These analytics | ||
provide sophisticated real-time automated {anomaly-detect} for time series data. | ||
|
||
The {ml} analytics statistically model the time-based characteristics of your | ||
data by observing historical behavior and adapting to new data. The model | ||
represents a baseline of normal behavior and can therefore be used to determine | ||
how anomalous new events are. | ||
|
||
{anomaly-detect-cap} results are written for each <<bucket-span,bucket span>>. | ||
These results include scores that are aggregated in order to reduce noise and | ||
normalized in order to rank the most mathematically significant anomalies. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
281 changes: 15 additions & 266 deletions
281
docs/en/stack/ml/anomaly-detection/ml-ad-finding-anomalies.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,278 +1,27 @@ | ||
[role="xpack"] | ||
[[ml-ad-finding-anomalies]] | ||
= Finding anomalies in time series data | ||
++++ | ||
<titleabbrev>Finding anomalies</titleabbrev> | ||
++++ | ||
|
||
The {ml-features} automate the analysis of time series data by creating | ||
accurate baselines of normal behavior in your data and using them to identify | ||
anomalous events or patterns. | ||
:keywords: {ml-init}, {stack}, {anomaly-detect} | ||
:description: An introduction to {ml} {anomaly-detect}, which analyzes time \ | ||
series data to identify and predict anomalous patterns in your data. | ||
|
||
The results of {ml} analysis are stored in {es} and you can use {kib} to help | ||
you visualize and explore the results. For example, the *{ml-app}* app provides | ||
charts that illustrate the actual data values, the bounds for the expected | ||
values, and the anomalies that occur outside these bounds: | ||
The {ml} {anomaly-detect} features automate the analysis of time series data by | ||
creating accurate baselines of normal behavior in your data. These baselines | ||
then enable you to identify anomalous events or patterns. Data is pulled from | ||
{es} for analysis and anomaly results are displayed in {kib} dashboards. For | ||
example, the *{ml-app}* app provides charts that illustrate the actual data | ||
values, the bounds for the expected values, and the anomalies that occur outside | ||
these bounds: | ||
|
||
[role="screenshot"] | ||
image::images/overview-smv.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"] | ||
|
||
Using <<ml-ad-algorithms,proprietary {ml} algorithms>>, the following | ||
circumstances are detected: | ||
The typical workflow for performing {anomaly-detect} is as follows: | ||
|
||
* Anomalies related to temporal deviations in values, counts, or frequencies | ||
* Statistical rarity | ||
* Unusual behaviors for a member of a population | ||
|
||
Automated periodicity detection and quick adaptation to changing data ensure | ||
that you don’t need to specify algorithms, models, or other data science-related | ||
configurations in order to get the benefits of {ml}. | ||
|
||
[discrete] | ||
[[ml-ad-algorithms]] | ||
== {anomaly-detect-cap} algorithms | ||
|
||
The {anomaly-detect} {ml-features} use a bespoke amalgamation of different | ||
techniques such as clustering, various types of time series decomposition, | ||
Bayesian distribution modeling, and correlation analysis. These analytics | ||
provide sophisticated real-time automated {anomaly-detect} for time series data. | ||
|
||
The {ml} analytics statistically model the time-based characteristics of your | ||
data by observing historical behavior and adapting to new data. The model | ||
represents a baseline of normal behavior and can therefore be used to determine | ||
how anomalous new events are. | ||
|
||
{anomaly-detect-cap} results are written for each <<bucket-span,bucket span>>. | ||
These results include scores that are aggregated in order to reduce noise and | ||
normalized in order to rank the most mathematically significant anomalies. | ||
|
||
[discrete] | ||
[[ml-ad-define-problem]] | ||
== 1. Define the problem | ||
|
||
The {ml-features} in {stack} enable you to seek anomalies in your data in many | ||
different ways. For example, there are functions that calculate metrics, | ||
analyze geographic data, or seek rare events in your data set. You can also | ||
optionally analyze your data relative to a specific population or group the data | ||
based on specific attributes. For the full list of functions, see | ||
<<ml-functions>>. | ||
|
||
The most important considerations are the data sets that you have available and | ||
the type of anomalous behavior you want to detect. | ||
|
||
If you are uncertain where to begin, {kib} can recognize certain types of data | ||
and suggest useful {ml} jobs. Likewise, some integrations in {fleet} include | ||
{anomaly-job} configuration information, dashboards, searches, and | ||
visualizations that are customized to help you analyze your data. | ||
|
||
[discrete] | ||
[[ml-ad-setup]] | ||
== 2. Set up the environment | ||
|
||
Before you can use the {stack-ml-features}, there are some configuration | ||
requirements (such as security privileges) that must be addressed. Refer to | ||
<<setup>>. | ||
|
||
[NOTE] | ||
=============================== | ||
If your data is located outside of {es}, you cannot use {kib} to create | ||
your jobs and you cannot use {dfeeds} to retrieve your data in real time. | ||
Posting data directly to {anomaly-jobs} is deprecated, in a future major version | ||
a {dfeed} will be required. | ||
=============================== | ||
|
||
[discrete] | ||
[[ml-ad-create-job]] | ||
== 3. Create a job | ||
|
||
{anomaly-jobs-cap} contain the configuration information and metadata | ||
necessary to perform the {ml} analysis. | ||
|
||
You can create {anomaly-jobs} by using the | ||
{ref}/ml-put-job.html[create {anomaly-jobs} API]. {kib} also provides | ||
wizards to simplify the process: | ||
|
||
[role="screenshot"] | ||
image::images/ml-create-job.jpg[Create New Job] | ||
|
||
* The single metric wizard creates simple jobs that have a single detector. A | ||
_detector_ applies an analytical function to specific fields in your data. In | ||
addition to limiting the number of detectors, the single metric wizard omits | ||
many of the more advanced configuration options. | ||
|
||
* The multi-metric wizard creates jobs that can have more than one detector, | ||
which is more efficient than running multiple jobs against the same data. | ||
|
||
* The population wizard creates jobs that detect activity that is unusual compared | ||
to the behavior of the population. For more information, see | ||
<<ml-configuring-populations>>. | ||
|
||
* The categorization wizard creates jobs that group log messages into categories | ||
and use `count` or `rare` functions to detect anomalies within them. See | ||
<<ml-configuring-categories>>. | ||
|
||
* The advanced wizard creates jobs that can have multiple detectors and enables | ||
you to configure all job settings. | ||
|
||
//TBD The rare wizard creates jobs that use `rare` or `freq_rare` functions to detect | ||
|
||
{kib} can also recognize certain types of data and provide specialized wizards | ||
for that context. For example, there are {anomaly-jobs} for the sample | ||
eCommerce orders and sample web logs data sets, as well as for data generated by | ||
the {elastic-sec} and {observability} solutions, {beats}, and {fleet} | ||
{integrations}. For a list of all the customized jobs, see <<ootb-ml-jobs>>. | ||
|
||
[[ml-ad-job-tips]] | ||
include::job-tips.asciidoc[leveloffset=+1] | ||
|
||
//// | ||
You can optionally assign jobs to one or more _job groups_. You can use | ||
job groups to view the results from multiple jobs more easily and to expedite | ||
administrative tasks by opening or closing multiple jobs at once. | ||
//// | ||
|
||
//TBD Mention impact of model plot config? | ||
|
||
[discrete] | ||
[[ml-ad-datafeeds]] | ||
=== {dfeeds-cap} | ||
|
||
include::ml-datafeeds.asciidoc[] | ||
|
||
[discrete] | ||
[[ml-ad-open-job]] | ||
== 4. Open the job | ||
|
||
An {anomaly-job} must be opened in order for it to be ready to receive and | ||
analyze data. It can be opened and closed multiple times throughout its | ||
lifecycle. | ||
|
||
After you start the job, you can start the {dfeed}, which retrieves data from | ||
your cluster. A {dfeed} can be started and stopped multiple times throughout its | ||
lifecycle. When you start it, you can optionally specify start and end times. If | ||
you do not specify an end time, the {dfeed} runs continuously. {dfeeds-cap} with | ||
an end time close their corresponding jobs when they are stopped. For this | ||
reason, when historical data is analysed, there is no need to stop the {dfeed} | ||
and/or close the job as they are stopped and closed automatically when the end | ||
time is reached. However, when a {dfeed} without an end time is stopped, it does | ||
not close the corresponding job automatically. | ||
|
||
You can perform both these tasks in {kib} or use the | ||
{ref}/ml-open-job.html[open {anomaly-jobs}] and | ||
{ref}/ml-start-datafeed.html[start {dfeeds}] APIs. | ||
|
||
[discrete] | ||
[[ml-ad-view-results]] | ||
== 5. View the job results | ||
|
||
After the {anomaly-job} has processed some data, you can view the results in | ||
{kib}. | ||
|
||
TIP: Depending on the capacity of your machine, you might need to wait a few | ||
seconds for the {ml} analysis to generate initial results. | ||
|
||
There are two tools for examining the results from {anomaly-jobs} in {kib}: the | ||
**Anomaly Explorer** and the **Single Metric Viewer**. | ||
|
||
//TBD Provide overview of the different purposes of the two interfaces. | ||
|
||
//Intro to swimlanes in Anomaly Explorer | ||
|
||
[discrete] | ||
[[ml-ad-bucket-results]] | ||
=== Bucket results | ||
|
||
include::ml-buckets.asciidoc[tag=bucket-results] | ||
|
||
[discrete] | ||
[[ml-ad-influencer-results]] | ||
=== Influencer results | ||
|
||
include::ml-influencers.asciidoc[tag=influencer-results] | ||
|
||
[discrete] | ||
[[ml-ad-tune]] | ||
== 6. Tune the job | ||
|
||
While your {anomaly-job} is open, you might find that you need to alter its | ||
configuration or settings. | ||
|
||
[discrete] | ||
[[ml-ad-calendars]] | ||
=== Calendars and scheduled events | ||
|
||
include::ml-calendars.asciidoc[] | ||
|
||
[discrete] | ||
[[ml-ad-rules]] | ||
=== Custom rules | ||
|
||
include::ml-rules.asciidoc[] | ||
|
||
[discrete] | ||
[[ml-ad-model-snapshots]] | ||
=== Model snapshots | ||
|
||
include::ml-model-snapshots.asciidoc[] | ||
|
||
[discrete] | ||
[[ml-ad-forecast]] | ||
== 7. Forecast future behavior | ||
|
||
After the {ml-features} create baselines of normal behavior for your data, | ||
you can use that information to extrapolate future behavior. | ||
|
||
You can use a forecast to estimate a time series value at a specific future date. | ||
For example, you might want to determine how many users you can expect to visit | ||
your website next Sunday at 0900. | ||
|
||
You can also use it to estimate the probability of a time series value occurring | ||
at a future date. For example, you might want to determine how likely it is that | ||
your disk utilization will reach 100% before the end of next week. | ||
|
||
Each forecast has a unique ID, which you can use to distinguish between forecasts | ||
that you created at different times. You can create a forecast by using the | ||
{ref}/ml-forecast.html[forecast {anomaly-jobs} API] or by using {kib}. For | ||
example: | ||
|
||
[role="screenshot"] | ||
image::images/overview-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"] | ||
|
||
The yellow line in the chart represents the predicted data values. The | ||
shaded yellow area represents the bounds for the predicted values, which also | ||
gives an indication of the confidence of the predictions. | ||
|
||
When you create a forecast, you specify its _duration_, which indicates how far | ||
the forecast extends beyond the last record that was processed. By default, the | ||
duration is 1 day. Typically the farther into the future that you forecast, the | ||
lower the confidence levels become (that is to say, the bounds increase). | ||
Eventually if the confidence levels are too low, the forecast stops. | ||
For more information about limitations that affect your ability to create a | ||
forecast, see <<ml-forecast-config-limitations>>. | ||
|
||
You can also optionally specify when the forecast expires. By default, it | ||
expires in 14 days and is deleted automatically thereafter. You can specify a | ||
different expiration period by using the `expires_in` parameter in the | ||
{ref}/ml-forecast.html[forecast {anomaly-jobs} API]. | ||
|
||
[discrete] | ||
[[ml-ad-close-job]] | ||
== 8. Close the job | ||
|
||
include::stopping-ml.asciidoc[leveloffset=+1] | ||
|
||
[discrete] | ||
== Next steps | ||
|
||
For a more detailed walk-through of {ml-features}, see <<ml-getting-started>>. | ||
|
||
For more advanced settings and scenarios, see <<anomaly-examples>>. | ||
|
||
Refer to <<anomaly-detection-scale>> to learn more about the particularities of | ||
large {anomaly-jobs}. | ||
|
||
[discrete] | ||
[[further-reading]] | ||
== Further reading | ||
|
||
https://www.elastic.co/blog/interpretability-in-ml-identifying-anomalies-influencers-root-causes[Interpretability in ML: Identifying anomalies, influencers, and root causes] | ||
* <<ml-ad-plan>> | ||
* <<ml-ad-run-jobs>> | ||
* <<ml-ad-view-results>> | ||
* <<ml-ad-forecast>> |
39 changes: 39 additions & 0 deletions
39
docs/en/stack/ml/anomaly-detection/ml-ad-forecast.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
[[ml-ad-forecast]] | ||
= Forecast future behavior | ||
:keywords: {ml-init}, {stack}, {anomaly-detect} | ||
|
||
After your {anomaly-job} creates baselines of normal behavior for your data, | ||
you can use that information to extrapolate future behavior. | ||
|
||
You can use a forecast to estimate a time series value at a specific future date. | ||
For example, you might want to determine how many users you can expect to visit | ||
your website next Sunday at 0900. | ||
|
||
You can also use it to estimate the probability of a time series value occurring | ||
at a future date. For example, you might want to determine how likely it is that | ||
your disk utilization will reach 100% before the end of next week. | ||
|
||
Each forecast has a unique ID, which you can use to distinguish between forecasts | ||
that you created at different times. You can create a forecast by using the | ||
{ref}/ml-forecast.html[forecast {anomaly-jobs} API] or by using {kib}. For | ||
example: | ||
|
||
[role="screenshot"] | ||
image::images/overview-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"] | ||
|
||
The yellow line in the chart represents the predicted data values. The | ||
shaded yellow area represents the bounds for the predicted values, which also | ||
gives an indication of the confidence of the predictions. | ||
|
||
When you create a forecast, you specify its _duration_, which indicates how far | ||
the forecast extends beyond the last record that was processed. By default, the | ||
duration is 1 day. Typically the farther into the future that you forecast, the | ||
lower the confidence levels become (that is to say, the bounds increase). | ||
Eventually if the confidence levels are too low, the forecast stops. | ||
For more information about limitations that affect your ability to create a | ||
forecast, see <<ml-forecast-config-limitations>>. | ||
|
||
You can also optionally specify when the forecast expires. By default, it | ||
expires in 14 days and is deleted automatically thereafter. You can specify a | ||
different expiration period by using the `expires_in` parameter in the | ||
{ref}/ml-forecast.html[forecast {anomaly-jobs} API]. |
12 changes: 2 additions & 10 deletions
12
docs/en/stack/ml/anomaly-detection/ml-ad-overview.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.