Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Summarizes anomaly detection steps #1747

Merged
merged 6 commits into from
Jul 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 0 additions & 22 deletions docs/en/stack/ml/anomaly-detection/create-jobs.asciidoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
[role="xpack"]
[[create-jobs]]
= Create {anomaly-jobs}

{anomaly-jobs-cap} contain the configuration information and metadata
necessary to perform an analytics task.

Expand Down Expand Up @@ -72,21 +68,3 @@ image::images/ml-data-recognizer-metricbeat.jpg[A screenshot of the {metricbeat}
These wizards create {anomaly-jobs}, dashboards, searches, and visualizations
that are customized to help you analyze your {auditbeat}, {filebeat}, and
{metricbeat} data.

[NOTE]
===============================
If your data is located outside of {es}, you cannot use {kib} to create
your jobs and you cannot use {dfeeds} to retrieve your data in real time.
===============================

////
Ready to get some hands-on experience? See
{ml-docs}/ml-getting-started.html[Getting Started with Machine Learning].

The following video tutorials also demonstrate single metric, multi-metric, and
advanced jobs:

* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]
* https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning for the Elastic Stack: Detect Outliers in a Population]
////
14 changes: 3 additions & 11 deletions docs/en/stack/ml/anomaly-detection/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,7 @@ include::ml-rules.asciidoc[leveloffset=+2]

include::ml-model-snapshots.asciidoc[leveloffset=+2]

include::ml-configuration.asciidoc[leveloffset=+1]

include::create-jobs.asciidoc[leveloffset=+2]

include::job-tips.asciidoc[leveloffset=+3]

include::stopping-ml.asciidoc[leveloffset=+2]

include::ml-restart-failed-jobs.asciidoc[leveloffset=+2]
include::ml-ad-finding-anomalies.asciidoc[leveloffset=+1]

include::ml-ad-concepts.asciidoc[leveloffset=+1]

Expand Down Expand Up @@ -62,6 +54,8 @@ include::{es-repo-dir}/ml/anomaly-detection/functions/ml-functions.asciidoc[leve

include::ootb-ml-jobs.asciidoc[leveloffset=+2]

include::ml-ad-troubleshooting.asciidoc[leveloffset=+2]

include::ootb-ml-jobs-apache.asciidoc[]

include::ootb-ml-jobs-apm.asciidoc[]
Expand Down Expand Up @@ -93,5 +87,3 @@ include::{es-repo-dir}/ml/anomaly-detection/functions/ml-rare-functions.asciidoc
include::{es-repo-dir}/ml/anomaly-detection/functions/ml-sum-functions.asciidoc[]

include::{es-repo-dir}/ml/anomaly-detection/functions/ml-time-functions.asciidoc[]

//include::ml-troubleshooting.asciidoc[leveloffset=+2]
7 changes: 0 additions & 7 deletions docs/en/stack/ml/anomaly-detection/job-tips.asciidoc
Original file line number Diff line number Diff line change
@@ -1,10 +1,3 @@
[role="xpack"]
[[job-tips]]
= Machine learning job tips
++++
<titleabbrev>Job tips</titleabbrev>
++++

When you create an {anomaly-job} in {kib}, the job creation wizards can
provide advice based on the characteristics of your data. By heeding these
suggestions, you can create jobs that are more likely to produce insightful {ml}
Expand Down
181 changes: 181 additions & 0 deletions docs/en/stack/ml/anomaly-detection/ml-ad-finding-anomalies.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
[chapter, role="xpack"]
[[ml-ad-finding-anomalies]]
= Finding anomalies in time series data
++++
<titleabbrev>Finding anomalies</titleabbrev>
++++

The {ml-features} automate the analysis of time series data by creating
accurate baselines of normal behavior in the data and identifying anomalous
patterns in that data.

Using <<ml-ad-algorithms,proprietary {ml} algorithms>>, the following
circumstances are detected, scored, and linked with statistically significant
influencers in the data:

* Anomalies related to temporal deviations in values, counts, or frequencies
* Statistical rarity
* Unusual behaviors for a member of a population

Automated periodicity detection and quick adaptation to changing data ensure
that you don’t need to specify algorithms, models, or other data science-related
configurations in order to get the benefits of {ml}.

To use the {ml-features} to analyze your data, you can create an {anomaly-job}
and send your data to that job. The results of {ml} analysis are stored in {es}
and you can use {kib} to help you visualize and explore the results. For example,
charts illustrate the actual data values, the bounds for the expected values,
and the anomalies that occur outside these bounds:

[role="screenshot"]
image::images/overview-smv.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]

[discrete]
[[ml-ad-algorithms]]
== {anomaly-detect-cap} algorithms

The {anomaly-detect} {ml-features} use a bespoke amalgamation of different
techniques such as clustering, various types of time series decomposition,
Bayesian distribution modeling, and correlation analysis. These analytics
provide sophisticated real-time automated {anomaly-detect} for time series data.

The {ml} analytics statistically model the time-based characteristics of your
data by observing historical behavior and adapting to new data. The model
represents a baseline of normal behavior and can therefore be used to determine
how anomalous new events are.

{anomaly-detect-cap} results are written for each <<ml-buckets,bucket span>>.
These results include scores that are aggregated in order to reduce noise and
normalized in order to rank the most mathematically significant anomalies. For
more information, see <<ml-bucket-results>> and <<ml-influencer-results>>.

[discrete]
[[ml-ad-define-problem]]
== 1. Define the problem

The {ml-features} in {stack} enable you to seek anomalies in your data in many
different ways. For example, there are functions that calculate metrics,
analyze geographic data, or seek rare events in your data set. You can also
optionally analyze your data relative to a specific population or group the data
based on specific attributes. For the full list of functions, see
<<ml-functions>>.

The most important considerations are the data sets that you have available and
the type of anomalous behavior you want to detect.

[discrete]
[[ml-ad-setup]]
== 2. Set up environment

If you want to use {ml-features}, there must be at least one {ml} node in
your cluster and all master-eligible nodes must have {ml} enabled. By default,
all nodes are {ml} nodes. For more information about these settings, see
{ref}/modules-node.html#ml-node[{ml} nodes].

If {stack-security-features} are enabled, you must also ensure your users have
the necessary privileges. See <<setup>>.

[NOTE]
===============================
If your data is located outside of {es}, you cannot use {kib} to create
your jobs and you cannot use {dfeeds} to retrieve your data in real time.
Posting data directly to {anomaly-jobs} is deprecated, in a future major version
a {dfeed} will be required.
===============================

[discrete]
[[ml-ad-create-job]]
== 3. Create a job

//TBD: Abbreviate this information and mention Fleet integration packages

include::create-jobs.asciidoc[]

For a list of all the customized jobs, see <<ootb-ml-jobs>>.

include::job-tips.asciidoc[leveloffset=+1]

[discrete]
[[ml-ad-open-job]]
== 4. Open the job

An {anomaly-job} must be opened in order for it to be ready to receive and
analyze data. It can be opened and closed multiple times throughout its
lifecycle.

After you start the job, you can start the {dfeed}, which retrieves data from
your cluster. A {dfeed} can be started and stopped multiple times throughout its
lifecycle.

You can perform both these tasks in {kib} or use the
{ref}/ml-open-job.html[open {anomaly-jobs}] and
{ref}/ml-start-datafeed.html[start {dfeeds}] APIs.

[discrete]
[[ml-ad-view-results]]
== 5. View the job results

After the {anomaly-job} has processed some data, you can view the results in
{kib}.

TIP: Depending on the capacity of your machine, you might need to wait a few
seconds for the {ml} analysis to generate initial results.

There are two tools for examining the results from {anomaly-jobs} in {kib}: the
**Anomaly Explorer** and the **Single Metric Viewer**.

[discrete]
[[ml-ad-forecast]]
== 6. Forecast future behavior

After the {ml-features} create baselines of normal behavior for your data,
you can use that information to extrapolate future behavior.

You can use a forecast to estimate a time series value at a specific future date.
For example, you might want to determine how many users you can expect to visit
your website next Sunday at 0900.

You can also use it to estimate the probability of a time series value occurring
at a future date. For example, you might want to determine how likely it is that
your disk utilization will reach 100% before the end of next week.

Each forecast has a unique ID, which you can use to distinguish between forecasts
that you created at different times. You can create a forecast by using the
{ref}/ml-forecast.html[forecast {anomaly-jobs} API] or by using {kib}. For
example:

[role="screenshot"]
image::images/overview-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]

The yellow line in the chart represents the predicted data values. The
shaded yellow area represents the bounds for the predicted values, which also
gives an indication of the confidence of the predictions.

When you create a forecast, you specify its _duration_, which indicates how far
the forecast extends beyond the last record that was processed. By default, the
duration is 1 day. Typically the farther into the future that you forecast, the
lower the confidence levels become (that is to say, the bounds increase).
Eventually if the confidence levels are too low, the forecast stops.
For more information about limitations that affect your ability to create a
forecast, see <<ml-forecast-config-limitations>>.

You can also optionally specify when the forecast expires. By default, it
expires in 14 days and is deleted automatically thereafter. You can specify a
different expiration period by using the `expires_in` parameter in the
{ref}/ml-forecast.html[forecast {anomaly-jobs} API].

[discrete]
[[ml-ad-close-job]]
== 7. Close the job

include::stopping-ml.asciidoc[leveloffset=+1]

[discrete]
== Next steps

For a more detailed walk-through of {ml-features}, see <<ml-getting-started>>.

For more advanced settings and scenarios, see <<anomaly-examples>>.

Refer to <<anomaly-detection-scale>> to learn more about the particularities of
large {anomaly-jobs}.
15 changes: 15 additions & 0 deletions docs/en/stack/ml/anomaly-detection/ml-ad-troubleshooting.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[role="xpack"]
[[ml-ad-troubleshooting]]
= Troubleshooting {ml} {anomaly-detect}
++++
<titleabbrev>Troubleshooting</titleabbrev>
++++

Use the information in this section to troubleshoot common problems and find
answers for frequently asked questions.

[discrete]
[[ml-ad-restart-failed-jobs]]
== Restart failed {anomaly-jobs}

include::ml-restart-failed-jobs.asciidoc[]
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
[role="xpack"]
[[ml-restart-failed-jobs]]
= Restart failed {anomaly-jobs}

If an {anomaly-job} fails, try to restart the job by following the procedure
described below. If the restarted job runs as expected, then the problem that
caused the job to fail was transient and no further investigation is needed. If
Expand Down
Loading