Skip to content

Commit

Permalink
[DOCS] Update Data Visualizer details in machine learning tutorial (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl authored Jan 12, 2021
1 parent d0fd365 commit ab499b3
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 38 deletions.
Binary file not shown.
Binary file modified docs/en/stack/ml/get-started/images/ml-gs-data-keyword.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/stack/ml/get-started/images/ml-gs-data-metric.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
59 changes: 21 additions & 38 deletions docs/en/stack/ml/get-started/ml-gs-visualizer.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -27,55 +27,38 @@ exploring. Alternatively, click
*Use full kibana_sample_data_logs data* to view the full time range of data.

. Optional: Change the sample size, which is the number of documents per shard
that are used in the visualizations. There is a relatively small number of
documents in the sample data, so you can choose a value of `all`. For larger
data sets, keep in mind that using a large sample size increases query run times
and increases the load on the cluster.
that are used in the {data-viz}. There is a relatively small number of
documents in the {kib} sample data, so you can choose a value of `all`. For
larger data sets, keep in mind that using a large sample size increases query
run times and increases the load on the cluster.

. Explore the fields and metrics in the {data-viz}.
. Explore the fields in the {data-viz}.
+
--
It lists the fields in two sections. The first section contains
the numeric ("metric") data types. The second section contains non-numeric data
types (such as `keyword`, `text`, `date`, `boolean`, `ip`, and `geo_point`). For
more information, see {ref}/mapping-types.html[Field data types].

For each metric, the {data-viz} indicates how many documents contain the field
in the selected time period. It also provides information about the minimum,
median, and maximum values, the number of distinct values, and their
distribution. You can use the distribution chart to get a better idea of how
the values in the data are clustered. Alternatively, you can view the top values
for metric fields. For example:

[role="screenshot"]
image::images/ml-gs-data-metric.jpg["{data-viz} output for top values in {kib}", width="50%",role="screenshot left"]
You can filter the list by field names or {ref}/mapping-types.html[field types].
The {data-viz} indicates how many of the documents in the sample for the
selected time period contain each field.

In particular, look at the `clientip`, `response.keyword`, and `url.keyword`
fields, since we'll use them in our {anomaly-jobs}. For
{ref}/ip.html[`ip`] and {ref}/keyword.html[`keyword`] fields, the {data-viz}
provides the number of distinct values, a list of the top values, and the number
and percentage of documents that contain the field during the selected time
period. For example:
fields, since we'll use them in our {anomaly-jobs}. For these fields, the
{data-viz} provides the number of distinct values, a list of the top values, and
the number and percentage of documents that contain the field. For example:

[role="screenshot"]
image:images/ml-gs-data-keyword.jpg["{data-viz} output for keyword fields in {kib}", width="50%",role="screenshot left"]
image::images/ml-gs-data-keyword.jpg["{data-viz} output for ip and keyword fields"]

[role="screenshot"]
image:images/ml-gs-data-ip.jpg["{data-viz} output for ip fields in {kib}", width="50%",role="screenshot left"]
For numeric fields, the {data-viz} provides information about the minimum,
median, maximum, and top values, the number of distinct values, and their
distribution. You can use the distribution chart to get a better idea of how the
values in the data are clustered. For example:

--
[role="screenshot"]
image::images/ml-gs-data-metric.jpg["{data-viz} for sample web logs"]

. Make note of the range of dates in the `@timestamp` field. They are relative
to when you added the sample data and you'll need that information later in the
tutorial.
+
--
For {ref}/date.html[`date`] fields, the {data-viz} provides the earliest and
latest field values and the number and percentage of documents that contain the
field during the selected time period:
TIP: Make note of the range of dates in the `@timestamp` field. They are
relative to when you added the sample data and you'll need that information
later in the tutorial.

[role="screenshot"]
image:images/ml-gs-data-timestamp.jpg["{data-viz} output for date fields in {kib}",width="50%",role="screenshot left"]
--

Now that you're familiar with the data in the `kibana_sample_data_logs` index,
Expand Down

0 comments on commit ab499b3

Please sign in to comment.