Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Split anomaly detection into multiple pages #1943

Merged
merged 2 commits into from
Jan 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/en/stack/ml/anomaly-detection/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,18 @@ include::ml-ad-overview.asciidoc[]

include::ml-ad-finding-anomalies.asciidoc[leveloffset=+1]

include::ml-ad-plan.asciidoc[leveloffset=+2]

include::ml-ad-run-jobs.asciidoc[leveloffset=+2]

include::ml-ad-view-results.asciidoc[leveloffset=+2]

include::ml-ad-forecast.asciidoc[leveloffset=+2]

include::ml-ad-concepts.asciidoc[leveloffset=+1]

include::ml-ad-algorithms.asciidoc[leveloffset=+2]

include::anomaly-detection-scale.asciidoc[leveloffset=+2]

include::ml-api-quickref.asciidoc[leveloffset=+1]
Expand Down
18 changes: 18 additions & 0 deletions docs/en/stack/ml/anomaly-detection/ml-ad-algorithms.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[[ml-ad-algorithms]]
= {anomaly-detect-cap} algorithms
:keywords: {ml-init}, {stack}, {anomaly-detect}

The {anomaly-detect} {ml-features} use a bespoke amalgamation of different
techniques such as clustering, various types of time series decomposition,
Bayesian distribution modeling, and correlation analysis. These analytics
provide sophisticated real-time automated {anomaly-detect} for time series data.

The {ml} analytics statistically model the time-based characteristics of your
data by observing historical behavior and adapting to new data. The model
represents a baseline of normal behavior and can therefore be used to determine
how anomalous new events are.

{anomaly-detect-cap} results are written for each
<<ml-ad-bucket-span,bucket span>>. These results include scores that are
aggregated in order to reduce noise and normalized in order to rank the most
mathematically significant anomalies.
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
This section explains the more complex concepts of the Elastic {ml}
{anomaly-detect} feature.

* <<ml-ad-algorithms>>
* <<anomaly-detection-scale>>
281 changes: 15 additions & 266 deletions docs/en/stack/ml/anomaly-detection/ml-ad-finding-anomalies.asciidoc
Original file line number Diff line number Diff line change
@@ -1,278 +1,27 @@
[role="xpack"]
[[ml-ad-finding-anomalies]]
= Finding anomalies in time series data
++++
<titleabbrev>Finding anomalies</titleabbrev>
++++

The {ml-features} automate the analysis of time series data by creating
accurate baselines of normal behavior in your data and using them to identify
anomalous events or patterns.
:keywords: {ml-init}, {stack}, {anomaly-detect}
:description: An introduction to {ml} {anomaly-detect}, which analyzes time \
series data to identify and predict anomalous patterns in your data.

The results of {ml} analysis are stored in {es} and you can use {kib} to help
you visualize and explore the results. For example, the *{ml-app}* app provides
charts that illustrate the actual data values, the bounds for the expected
values, and the anomalies that occur outside these bounds:
The {ml} {anomaly-detect} features automate the analysis of time series data by
creating accurate baselines of normal behavior in your data. These baselines
then enable you to identify anomalous events or patterns. Data is pulled from
{es} for analysis and anomaly results are displayed in {kib} dashboards. For
example, the *{ml-app}* app provides charts that illustrate the actual data
values, the bounds for the expected values, and the anomalies that occur outside
these bounds:

[role="screenshot"]
image::images/overview-smv.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]

Using <<ml-ad-algorithms,proprietary {ml} algorithms>>, the following
circumstances are detected:
The typical workflow for performing {anomaly-detect} is as follows:

* Anomalies related to temporal deviations in values, counts, or frequencies
* Statistical rarity
* Unusual behaviors for a member of a population

Automated periodicity detection and quick adaptation to changing data ensure
that you don’t need to specify algorithms, models, or other data science-related
configurations in order to get the benefits of {ml}.

[discrete]
[[ml-ad-algorithms]]
== {anomaly-detect-cap} algorithms

The {anomaly-detect} {ml-features} use a bespoke amalgamation of different
techniques such as clustering, various types of time series decomposition,
Bayesian distribution modeling, and correlation analysis. These analytics
provide sophisticated real-time automated {anomaly-detect} for time series data.

The {ml} analytics statistically model the time-based characteristics of your
data by observing historical behavior and adapting to new data. The model
represents a baseline of normal behavior and can therefore be used to determine
how anomalous new events are.

{anomaly-detect-cap} results are written for each <<bucket-span,bucket span>>.
These results include scores that are aggregated in order to reduce noise and
normalized in order to rank the most mathematically significant anomalies.

[discrete]
[[ml-ad-define-problem]]
== 1. Define the problem

The {ml-features} in {stack} enable you to seek anomalies in your data in many
different ways. For example, there are functions that calculate metrics,
analyze geographic data, or seek rare events in your data set. You can also
optionally analyze your data relative to a specific population or group the data
based on specific attributes. For the full list of functions, see
<<ml-functions>>.

The most important considerations are the data sets that you have available and
the type of anomalous behavior you want to detect.

If you are uncertain where to begin, {kib} can recognize certain types of data
and suggest useful {ml} jobs. Likewise, some integrations in {fleet} include
{anomaly-job} configuration information, dashboards, searches, and
visualizations that are customized to help you analyze your data.

[discrete]
[[ml-ad-setup]]
== 2. Set up the environment

Before you can use the {stack-ml-features}, there are some configuration
requirements (such as security privileges) that must be addressed. Refer to
<<setup>>.

[NOTE]
===============================
If your data is located outside of {es}, you cannot use {kib} to create
your jobs and you cannot use {dfeeds} to retrieve your data in real time.
Posting data directly to {anomaly-jobs} is deprecated, in a future major version
a {dfeed} will be required.
===============================

[discrete]
[[ml-ad-create-job]]
== 3. Create a job

{anomaly-jobs-cap} contain the configuration information and metadata
necessary to perform the {ml} analysis.

You can create {anomaly-jobs} by using the
{ref}/ml-put-job.html[create {anomaly-jobs} API]. {kib} also provides
wizards to simplify the process:

[role="screenshot"]
image::images/ml-create-job.jpg[Create New Job]

* The single metric wizard creates simple jobs that have a single detector. A
_detector_ applies an analytical function to specific fields in your data. In
addition to limiting the number of detectors, the single metric wizard omits
many of the more advanced configuration options.

* The multi-metric wizard creates jobs that can have more than one detector,
which is more efficient than running multiple jobs against the same data.

* The population wizard creates jobs that detect activity that is unusual compared
to the behavior of the population. For more information, see
<<ml-configuring-populations>>.

* The categorization wizard creates jobs that group log messages into categories
and use `count` or `rare` functions to detect anomalies within them. See
<<ml-configuring-categories>>.

* The advanced wizard creates jobs that can have multiple detectors and enables
you to configure all job settings.

//TBD The rare wizard creates jobs that use `rare` or `freq_rare` functions to detect

{kib} can also recognize certain types of data and provide specialized wizards
for that context. For example, there are {anomaly-jobs} for the sample
eCommerce orders and sample web logs data sets, as well as for data generated by
the {elastic-sec} and {observability} solutions, {beats}, and {fleet}
{integrations}. For a list of all the customized jobs, see <<ootb-ml-jobs>>.

[[ml-ad-job-tips]]
include::job-tips.asciidoc[leveloffset=+1]

////
You can optionally assign jobs to one or more _job groups_. You can use
job groups to view the results from multiple jobs more easily and to expedite
administrative tasks by opening or closing multiple jobs at once.
////

//TBD Mention impact of model plot config?

[discrete]
[[ml-ad-datafeeds]]
=== {dfeeds-cap}

include::ml-datafeeds.asciidoc[]

[discrete]
[[ml-ad-open-job]]
== 4. Open the job

An {anomaly-job} must be opened in order for it to be ready to receive and
analyze data. It can be opened and closed multiple times throughout its
lifecycle.

After you start the job, you can start the {dfeed}, which retrieves data from
your cluster. A {dfeed} can be started and stopped multiple times throughout its
lifecycle. When you start it, you can optionally specify start and end times. If
you do not specify an end time, the {dfeed} runs continuously. {dfeeds-cap} with
an end time close their corresponding jobs when they are stopped. For this
reason, when historical data is analysed, there is no need to stop the {dfeed}
and/or close the job as they are stopped and closed automatically when the end
time is reached. However, when a {dfeed} without an end time is stopped, it does
not close the corresponding job automatically.

You can perform both these tasks in {kib} or use the
{ref}/ml-open-job.html[open {anomaly-jobs}] and
{ref}/ml-start-datafeed.html[start {dfeeds}] APIs.

[discrete]
[[ml-ad-view-results]]
== 5. View the job results

After the {anomaly-job} has processed some data, you can view the results in
{kib}.

TIP: Depending on the capacity of your machine, you might need to wait a few
seconds for the {ml} analysis to generate initial results.

There are two tools for examining the results from {anomaly-jobs} in {kib}: the
**Anomaly Explorer** and the **Single Metric Viewer**.

//TBD Provide overview of the different purposes of the two interfaces.

//Intro to swimlanes in Anomaly Explorer

[discrete]
[[ml-ad-bucket-results]]
=== Bucket results

include::ml-buckets.asciidoc[tag=bucket-results]

[discrete]
[[ml-ad-influencer-results]]
=== Influencer results

include::ml-influencers.asciidoc[tag=influencer-results]

[discrete]
[[ml-ad-tune]]
== 6. Tune the job

While your {anomaly-job} is open, you might find that you need to alter its
configuration or settings.

[discrete]
[[ml-ad-calendars]]
=== Calendars and scheduled events

include::ml-calendars.asciidoc[]

[discrete]
[[ml-ad-rules]]
=== Custom rules

include::ml-rules.asciidoc[]

[discrete]
[[ml-ad-model-snapshots]]
=== Model snapshots

include::ml-model-snapshots.asciidoc[]

[discrete]
[[ml-ad-forecast]]
== 7. Forecast future behavior

After the {ml-features} create baselines of normal behavior for your data,
you can use that information to extrapolate future behavior.

You can use a forecast to estimate a time series value at a specific future date.
For example, you might want to determine how many users you can expect to visit
your website next Sunday at 0900.

You can also use it to estimate the probability of a time series value occurring
at a future date. For example, you might want to determine how likely it is that
your disk utilization will reach 100% before the end of next week.

Each forecast has a unique ID, which you can use to distinguish between forecasts
that you created at different times. You can create a forecast by using the
{ref}/ml-forecast.html[forecast {anomaly-jobs} API] or by using {kib}. For
example:

[role="screenshot"]
image::images/overview-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]

The yellow line in the chart represents the predicted data values. The
shaded yellow area represents the bounds for the predicted values, which also
gives an indication of the confidence of the predictions.

When you create a forecast, you specify its _duration_, which indicates how far
the forecast extends beyond the last record that was processed. By default, the
duration is 1 day. Typically the farther into the future that you forecast, the
lower the confidence levels become (that is to say, the bounds increase).
Eventually if the confidence levels are too low, the forecast stops.
For more information about limitations that affect your ability to create a
forecast, see <<ml-forecast-config-limitations>>.

You can also optionally specify when the forecast expires. By default, it
expires in 14 days and is deleted automatically thereafter. You can specify a
different expiration period by using the `expires_in` parameter in the
{ref}/ml-forecast.html[forecast {anomaly-jobs} API].

[discrete]
[[ml-ad-close-job]]
== 8. Close the job

include::stopping-ml.asciidoc[leveloffset=+1]

[discrete]
== Next steps

For a more detailed walk-through of {ml-features}, see <<ml-getting-started>>.

For more advanced settings and scenarios, see <<anomaly-examples>>.

Refer to <<anomaly-detection-scale>> to learn more about the particularities of
large {anomaly-jobs}.

[discrete]
[[further-reading]]
== Further reading

https://www.elastic.co/blog/interpretability-in-ml-identifying-anomalies-influencers-root-causes[Interpretability in ML: Identifying anomalies, influencers, and root causes]
* <<ml-ad-plan>>
* <<ml-ad-run-jobs>>
* <<ml-ad-view-results>>
* <<ml-ad-forecast>>
39 changes: 39 additions & 0 deletions docs/en/stack/ml/anomaly-detection/ml-ad-forecast.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
[[ml-ad-forecast]]
= Forecast future behavior
:keywords: {ml-init}, {stack}, {anomaly-detect}

After your {anomaly-job} creates baselines of normal behavior for your data,
you can use that information to extrapolate future behavior.

You can use a forecast to estimate a time series value at a specific future date.
For example, you might want to determine how many users you can expect to visit
your website next Sunday at 0900.

You can also use it to estimate the probability of a time series value occurring
at a future date. For example, you might want to determine how likely it is that
your disk utilization will reach 100% before the end of next week.

Each forecast has a unique ID, which you can use to distinguish between forecasts
that you created at different times. You can create a forecast by using the
{ref}/ml-forecast.html[forecast {anomaly-jobs} API] or by using {kib}. For
example:

[role="screenshot"]
image::images/overview-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]

The yellow line in the chart represents the predicted data values. The
shaded yellow area represents the bounds for the predicted values, which also
gives an indication of the confidence of the predictions.

When you create a forecast, you specify its _duration_, which indicates how far
the forecast extends beyond the last record that was processed. By default, the
duration is 1 day. Typically the farther into the future that you forecast, the
lower the confidence levels become (that is to say, the bounds increase).
Eventually if the confidence levels are too low, the forecast stops.
For more information about limitations that affect your ability to create a
forecast, see <<ml-forecast-config-limitations>>.

You can also optionally specify when the forecast expires. By default, it
expires in 14 days and is deleted automatically thereafter. You can specify a
different expiration period by using the `expires_in` parameter in the
{ref}/ml-forecast.html[forecast {anomaly-jobs} API].
12 changes: 2 additions & 10 deletions docs/en/stack/ml/anomaly-detection/ml-ad-overview.asciidoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,8 @@
[role="xpack"]
[[ml-ad-overview]]
= {anomaly-detect-cap}
:keywords: {ml-init}, {stack}, {anomaly-detect}

:keywords: {ml-init}, {stack}, {anomaly-detect}, overview
:description: An introduction to {ml} {anomaly-detect}, which analyzes time \
series data to identify and predict anomalous patterns in your data.

Use {anomaly-detect} to analyze time series data by creating accurate baselines
of normal behavior and identifying anomalous patterns in your data set. Data is
pulled from {es} for analysis and anomaly results are displayed in {kib}
dashboards. Consult <<setup>> to learn more about the licence and the security
privileges that are required to use {anomaly-detect}.
You can use {stack} {ml-features} to analyze time series data and identify anomalous patterns in your data set.

* <<ml-ad-finding-anomalies>>
* <<ml-ad-concepts>>
Expand Down
Loading