[DOCS] Split anomaly detection into multiple pages

elastic · Jan 19, 2022 · 712dc6e · 712dc6e
1 parent b751e96
commit 712dc6e
Show file tree

Hide file tree

Showing 11 changed files with 285 additions and 278 deletions.
diff --git a/docs/en/stack/ml/anomaly-detection/index.asciidoc b/docs/en/stack/ml/anomaly-detection/index.asciidoc
@@ -2,8 +2,18 @@ include::ml-ad-overview.asciidoc[]
 
 include::ml-ad-finding-anomalies.asciidoc[leveloffset=+1]
 
+include::ml-ad-plan.asciidoc[leveloffset=+2]
+
+include::ml-ad-run-jobs.asciidoc[leveloffset=+2]
+
+include::ml-ad-view-results.asciidoc[leveloffset=+2]
+
+include::ml-ad-forecast.asciidoc[leveloffset=+2]
+
 include::ml-ad-concepts.asciidoc[leveloffset=+1]
 
+include::ml-ad-algorithms.asciidoc[leveloffset=+2]
+
 include::anomaly-detection-scale.asciidoc[leveloffset=+2]
 
 include::ml-api-quickref.asciidoc[leveloffset=+1]

diff --git a/docs/en/stack/ml/anomaly-detection/ml-ad-algorithms.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-ad-algorithms.asciidoc
@@ -0,0 +1,17 @@
+[[ml-ad-algorithms]]
+= {anomaly-detect-cap} algorithms
+:keywords: {ml-init}, {stack}, {anomaly-detect}
+
+The {anomaly-detect} {ml-features} use a bespoke amalgamation of different
+techniques such as clustering, various types of time series decomposition,
+Bayesian distribution modeling, and correlation analysis. These analytics
+provide sophisticated real-time automated {anomaly-detect} for time series data.
+
+The {ml} analytics statistically model the time-based characteristics of your
+data by observing historical behavior and adapting to new data. The model
+represents a baseline of normal behavior and can therefore be used to determine
+how anomalous new events are.
+
+{anomaly-detect-cap} results are written for each <<bucket-span,bucket span>>.
+These results include scores that are aggregated in order to reduce noise and
+normalized in order to rank the most mathematically significant anomalies.
diff --git a/docs/en/stack/ml/anomaly-detection/ml-ad-concepts.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-ad-concepts.asciidoc
@@ -5,4 +5,5 @@
 This section explains the more complex concepts of the Elastic {ml} 
 {anomaly-detect} feature.
 
+* <<ml-ad-algorithms>>
 * <<anomaly-detection-scale>>
diff --git a/docs/en/stack/ml/anomaly-detection/ml-ad-finding-anomalies.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-ad-finding-anomalies.asciidoc
@@ -1,278 +1,27 @@
-[role="xpack"]
 [[ml-ad-finding-anomalies]]
 = Finding anomalies in time series data
 ++++
 <titleabbrev>Finding anomalies</titleabbrev>
 ++++
 
-The {ml-features} automate the analysis of time series data by creating
-accurate baselines of normal behavior in your data and using them to identify
-anomalous events or patterns.
+:keywords: {ml-init}, {stack}, {anomaly-detect}
+:description: An introduction to {ml} {anomaly-detect}, which analyzes time \
+series data to identify and predict anomalous patterns in your data.
 
-The results of {ml} analysis are stored in {es} and you can use {kib} to help
-you visualize and explore the results. For example, the *{ml-app}* app provides
-charts that illustrate the actual data values, the bounds for the expected
-values, and the anomalies that occur outside these bounds:
+The {ml} {anomaly-detect} features automate the analysis of time series data by
+creating accurate baselines of normal behavior in your data. These baselines
+then enable you to identify anomalous events or patterns. Data is pulled from
+{es} for analysis and anomaly results are displayed in {kib} dashboards. For
+example, the *{ml-app}* app provides charts that illustrate the actual data
+values, the bounds for the expected values, and the anomalies that occur outside
+these bounds:
 
 [role="screenshot"]
 image::images/overview-smv.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
 
-Using <<ml-ad-algorithms,proprietary {ml} algorithms>>, the following
-circumstances are detected:
+The typical workflow for performing {anomaly-detect} is as follows:
 
-* Anomalies related to temporal deviations in values, counts, or frequencies
-* Statistical rarity
-* Unusual behaviors for a member of a population
-
-Automated periodicity detection and quick adaptation to changing data ensure
-that you don’t need to specify algorithms, models, or other data science-related
-configurations in order to get the benefits of {ml}.
-
-[discrete]
-[[ml-ad-algorithms]]
-== {anomaly-detect-cap} algorithms
-
-The {anomaly-detect} {ml-features} use a bespoke amalgamation of different
-techniques such as clustering, various types of time series decomposition,
-Bayesian distribution modeling, and correlation analysis. These analytics
-provide sophisticated real-time automated {anomaly-detect} for time series data.
-
-The {ml} analytics statistically model the time-based characteristics of your
-data by observing historical behavior and adapting to new data. The model
-represents a baseline of normal behavior and can therefore be used to determine
-how anomalous new events are.
-
-{anomaly-detect-cap} results are written for each <<bucket-span,bucket span>>.
-These results include scores that are aggregated in order to reduce noise and
-normalized in order to rank the most mathematically significant anomalies.
-
-[discrete]
-[[ml-ad-define-problem]]
-== 1. Define the problem
-
-The {ml-features} in {stack} enable you to seek anomalies in your data in many
-different ways. For example, there are functions that calculate metrics,
-analyze geographic data, or seek rare events in your data set. You can also
-optionally analyze your data relative to a specific population or group the data
-based on specific attributes. For the full list of functions, see
-<<ml-functions>>.
-
-The most important considerations are the data sets that you have available and
-the type of anomalous behavior you want to detect.
-
-If you are uncertain where to begin, {kib} can recognize certain types of data
-and suggest useful {ml} jobs. Likewise, some integrations in {fleet} include
-{anomaly-job} configuration information, dashboards, searches, and
-visualizations that are customized to help you analyze your data. 
-
-[discrete]
-[[ml-ad-setup]]
-== 2. Set up the environment
-
-Before you can use the {stack-ml-features}, there are some configuration
-requirements (such as security privileges) that must be addressed. Refer to
-<<setup>>.
-
-[NOTE]
-===============================
-If your data is located outside of {es}, you cannot use {kib} to create
-your jobs and you cannot use {dfeeds} to retrieve your data in real time.
-Posting data directly to {anomaly-jobs} is deprecated, in a future major version
-a {dfeed} will be required.
-===============================
-
-[discrete]
-[[ml-ad-create-job]]
-== 3. Create a job
-
-{anomaly-jobs-cap} contain the configuration information and metadata
-necessary to perform the {ml} analysis.
-
-You can create {anomaly-jobs} by using the
-{ref}/ml-put-job.html[create {anomaly-jobs} API]. {kib} also provides
-wizards to simplify the process:
-
-[role="screenshot"]
-image::images/ml-create-job.jpg[Create New Job]
-
-* The single metric wizard creates simple jobs that have a single detector. A
-_detector_ applies an analytical function to specific fields in your data. In
-addition to limiting the number of detectors, the single metric wizard omits
-many of the more advanced configuration options.
-
-* The multi-metric wizard creates jobs that can have more than one detector,
-which is more efficient than running multiple jobs against the same data.
-
-* The population wizard creates jobs that detect activity that is unusual compared
-to the behavior of the population. For more information, see
-<<ml-configuring-populations>>.
-
-* The categorization wizard creates jobs that group log messages into categories
-and use `count` or `rare` functions to detect anomalies within them. See
-<<ml-configuring-categories>>.
-
-* The advanced wizard creates jobs that can have multiple detectors and enables
-you to configure all job settings.
-
-//TBD The rare wizard creates jobs that use `rare` or `freq_rare` functions to detect
-
-{kib} can also recognize certain types of data and provide specialized wizards
-for that context. For example, there are {anomaly-jobs} for the sample
-eCommerce orders and sample web logs data sets, as well as for data generated by
-the {elastic-sec} and {observability} solutions, {beats}, and {fleet}
-{integrations}. For a list of all the customized jobs, see <<ootb-ml-jobs>>.
-
-[[ml-ad-job-tips]]
-include::job-tips.asciidoc[leveloffset=+1]
-
-////
-You can optionally assign jobs to one or more _job groups_. You can use
-job groups to view the results from multiple jobs more easily and to expedite
-administrative tasks by opening or closing multiple jobs at once.
-////
-
-//TBD Mention impact of model plot config?
-
-[discrete]
-[[ml-ad-datafeeds]]
-=== {dfeeds-cap}
-
-include::ml-datafeeds.asciidoc[]
-
-[discrete]
-[[ml-ad-open-job]]
-== 4. Open the job
-
-An {anomaly-job} must be opened in order for it to be ready to receive and
-analyze data. It can be opened and closed multiple times throughout its
-lifecycle.
-
-After you start the job, you can start the {dfeed}, which retrieves data from
-your cluster. A {dfeed} can be started and stopped multiple times throughout its
-lifecycle. When you start it, you can optionally specify start and end times. If
-you do not specify an end time, the {dfeed} runs continuously. {dfeeds-cap} with
-an end time close their corresponding jobs when they are stopped. For this
-reason, when historical data is analysed, there is no need to stop the {dfeed}
-and/or close the job as they are stopped and closed automatically when the end
-time is reached. However, when a {dfeed} without an end time is stopped, it does
-not close the corresponding job automatically.
-
-You can perform both these tasks in {kib} or use the
-{ref}/ml-open-job.html[open {anomaly-jobs}] and
-{ref}/ml-start-datafeed.html[start {dfeeds}] APIs.
-
-[discrete]
-[[ml-ad-view-results]]
-== 5. View the job results
-
-After the {anomaly-job} has processed some data, you can view the results in
-{kib}.
-
-TIP: Depending on the capacity of your machine, you might need to wait a few
-seconds for the {ml} analysis to generate initial results.
-
-There are two tools for examining the results from {anomaly-jobs} in {kib}: the
-**Anomaly Explorer** and the **Single Metric Viewer**.
-
-//TBD Provide overview of the different purposes of the two interfaces.
-
-//Intro to swimlanes in Anomaly Explorer
-
-[discrete]
-[[ml-ad-bucket-results]]
-=== Bucket results
-
-include::ml-buckets.asciidoc[tag=bucket-results]
-
-[discrete]
-[[ml-ad-influencer-results]]
-=== Influencer results
-
-include::ml-influencers.asciidoc[tag=influencer-results]
-
-[discrete]
-[[ml-ad-tune]]
-== 6. Tune the job
-
-While your {anomaly-job} is open, you might find that you need to alter its
-configuration or settings.
-
-[discrete]
-[[ml-ad-calendars]]
-=== Calendars and scheduled events
-
-include::ml-calendars.asciidoc[]
-
-[discrete]
-[[ml-ad-rules]]
-=== Custom rules
-
-include::ml-rules.asciidoc[]
-
-[discrete]
-[[ml-ad-model-snapshots]]
-=== Model snapshots
-
-include::ml-model-snapshots.asciidoc[]
-
-[discrete]
-[[ml-ad-forecast]]
-== 7. Forecast future behavior
-
-After the {ml-features} create baselines of normal behavior for your data,
-you can use that information to extrapolate future behavior.
-
-You can use a forecast to estimate a time series value at a specific future date.
-For example, you might want to determine how many users you can expect to visit
-your website next Sunday at 0900.
-
-You can also use it to estimate the probability of a time series value occurring
-at a future date. For example, you might want to determine how likely it is that
-your disk utilization will reach 100% before the end of next week.
-
-Each forecast has a unique ID, which you can use to distinguish between forecasts
-that you created at different times. You can create a forecast by using the
-{ref}/ml-forecast.html[forecast {anomaly-jobs} API] or by using {kib}. For
-example:
-
-[role="screenshot"]
-image::images/overview-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
-
-The yellow line in the chart represents the predicted data values. The
-shaded yellow area represents the bounds for the predicted values, which also
-gives an indication of the confidence of the predictions.
-
-When you create a forecast, you specify its _duration_, which indicates how far
-the forecast extends beyond the last record that was processed. By default, the
-duration is 1 day. Typically the farther into the future that you forecast, the
-lower the confidence levels become (that is to say, the bounds increase).
-Eventually if the confidence levels are too low, the forecast stops.
-For more information about limitations that affect your ability to create a
-forecast, see <<ml-forecast-config-limitations>>.
-
-You can also optionally specify when the forecast expires. By default, it
-expires in 14 days and is deleted automatically thereafter. You can specify a
-different expiration period by using the `expires_in` parameter in the
-{ref}/ml-forecast.html[forecast {anomaly-jobs} API].
-
-[discrete]
-[[ml-ad-close-job]]
-== 8. Close the job
-
-include::stopping-ml.asciidoc[leveloffset=+1]
-
-[discrete]
-== Next steps
-
-For a more detailed walk-through of {ml-features}, see <<ml-getting-started>>.
-
-For more advanced settings and scenarios, see <<anomaly-examples>>.
-
-Refer to <<anomaly-detection-scale>> to learn more about the particularities of 
-large {anomaly-jobs}.
-
-[discrete]
-[[further-reading]]
-== Further reading
-
-https://www.elastic.co/blog/interpretability-in-ml-identifying-anomalies-influencers-root-causes[Interpretability in ML: Identifying anomalies, influencers, and root causes]
+* <<ml-ad-plan>>
+* <<ml-ad-run-jobs>>
+* <<ml-ad-view-results>>
+* <<ml-ad-forecast>>
diff --git a/docs/en/stack/ml/anomaly-detection/ml-ad-forecast.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-ad-forecast.asciidoc
@@ -0,0 +1,39 @@
+[[ml-ad-forecast]]
+= Forecast future behavior
+:keywords: {ml-init}, {stack}, {anomaly-detect}
+
+After your {anomaly-job} creates baselines of normal behavior for your data,
+you can use that information to extrapolate future behavior.
+
+You can use a forecast to estimate a time series value at a specific future date.
+For example, you might want to determine how many users you can expect to visit
+your website next Sunday at 0900.
+
+You can also use it to estimate the probability of a time series value occurring
+at a future date. For example, you might want to determine how likely it is that
+your disk utilization will reach 100% before the end of next week.
+
+Each forecast has a unique ID, which you can use to distinguish between forecasts
+that you created at different times. You can create a forecast by using the
+{ref}/ml-forecast.html[forecast {anomaly-jobs} API] or by using {kib}. For
+example:
+
+[role="screenshot"]
+image::images/overview-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
+
+The yellow line in the chart represents the predicted data values. The
+shaded yellow area represents the bounds for the predicted values, which also
+gives an indication of the confidence of the predictions.
+
+When you create a forecast, you specify its _duration_, which indicates how far
+the forecast extends beyond the last record that was processed. By default, the
+duration is 1 day. Typically the farther into the future that you forecast, the
+lower the confidence levels become (that is to say, the bounds increase).
+Eventually if the confidence levels are too low, the forecast stops.
+For more information about limitations that affect your ability to create a
+forecast, see <<ml-forecast-config-limitations>>.
+
+You can also optionally specify when the forecast expires. By default, it
+expires in 14 days and is deleted automatically thereafter. You can specify a
+different expiration period by using the `expires_in` parameter in the
+{ref}/ml-forecast.html[forecast {anomaly-jobs} API].
diff --git a/docs/en/stack/ml/anomaly-detection/ml-ad-overview.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-ad-overview.asciidoc
@@ -1,16 +1,8 @@
-[role="xpack"]
 [[ml-ad-overview]]
 = {anomaly-detect-cap}
+:keywords: {ml-init}, {stack}, {anomaly-detect}
 
-:keywords: {ml-init}, {stack}, {anomaly-detect}, overview
-:description: An introduction to {ml} {anomaly-detect}, which analyzes time \
-series data to identify and predict anomalous patterns in your data.
-
-Use {anomaly-detect} to analyze time series data by creating accurate baselines 
-of normal behavior and identifying anomalous patterns in your data set. Data is 
-pulled from {es} for analysis and anomaly results are displayed in {kib} 
-dashboards. Consult <<setup>> to learn more about the licence and the security 
-privileges that are required to use {anomaly-detect}.
+You can use {stack} {ml-features} to analyze time series data and identify anomalous patterns in your data set.
 
 * <<ml-ad-finding-anomalies>>
 * <<ml-ad-concepts>>