-
Notifications
You must be signed in to change notification settings - Fork 141
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Kmeans and AD command documentation (#493)
Signed-off-by: jackieyanghan <[email protected]>
- Loading branch information
1 parent
fff24aa
commit ee4bce0
Showing
9 changed files
with
2,082 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
============= | ||
ad | ||
============= | ||
|
||
.. rubric:: Table of contents | ||
|
||
.. contents:: | ||
:local: | ||
:depth: 2 | ||
|
||
|
||
Description | ||
============ | ||
| The ``ad`` command applies Random Cut Forest (RCF) algorithm in ml-commons plugin on the search result returned by a PPL command. Based on the input, two types of RCF algorithms will be utilized: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data. | ||
|
||
Fixed In Time RCF For Time-series Data Command Syntax | ||
===================================================== | ||
ad <shingle_size> <time_decay> <time_field> | ||
|
||
* shingle_size: optional. A shingle is a consecutive sequence of the most recent records. The default value is 8. | ||
* time_decay: optional. It specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | ||
* time_field: mandatory. It specifies the time filed for RCF to use as time-series data. | ||
|
||
|
||
Batch RCF for Non-time-series Data Command Syntax | ||
================================================= | ||
ad <shingle_size> <time_decay> | ||
|
||
* shingle_size: optional. A shingle is a consecutive sequence of the most recent records. The default value is 8. | ||
* time_decay: optional. It specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | ||
|
||
|
||
Example1: Detecting events in New York City from taxi ridership data with time-series data | ||
========================================================================================== | ||
|
||
The example trains a RCF model and use the model to detect anomalies in the time-series ridership data. | ||
|
||
PPL query:: | ||
|
||
os> source=nyc_taxi | fields value, timestamp | AD time_field='timestamp' | where value=10844.0' | ||
+----------+---------------+-------+---------------+ | ||
| value | timestamp | score | anomaly_grade | | ||
|----------+---------------+-------+---------------| | ||
| 10844.0 | 1404172800000 | 0.0 | 0.0 | | ||
+----------+---------------+-------+---------------+ | ||
|
||
|
||
Example2: Detecting events in New York City from taxi ridership data with non-time-series data | ||
============================================================================================== | ||
|
||
The example trains a RCF model and use the model to detect anomalies in the non-time-series ridership data. | ||
|
||
PPL query:: | ||
|
||
os> source=nyc_taxi | fields value | AD | where value=10844.0' | ||
+----------+--------+-----------+ | ||
| value | score | anomalous | | ||
|----------+--------+-----------| | ||
| 10844.0 | 0.0 | false | | ||
+----------+--------+-----------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
============= | ||
kmeans | ||
============= | ||
|
||
.. rubric:: Table of contents | ||
|
||
.. contents:: | ||
:local: | ||
:depth: 2 | ||
|
||
|
||
Description | ||
============ | ||
| The ``kmeans`` command applies kmeans algorithm in ml-commons plugin on the search result returned by a PPL command. | ||
|
||
Syntax | ||
====== | ||
kmeans <cluster-number> | ||
|
||
* cluster-number: mandatory. The number of clusters you want to group your data points into. | ||
|
||
|
||
Example: Clustering of Iris Dataset | ||
=================================== | ||
|
||
The example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. | ||
|
||
PPL query:: | ||
|
||
os> source=iris_data | fields sepal_length_in_cm, sepal_width_in_cm, petal_length_in_cm, petal_width_in_cm | kmeans 3 | ||
+--------------------+-------------------+--------------------+-------------------+-----------+ | ||
| sepal_length_in_cm | sepal_width_in_cm | petal_length_in_cm | petal_width_in_cm | ClusterID | | ||
|--------------------+-------------------+--------------------+-------------------+-----------| | ||
| 5.1 | 3.5 | 1.4 | 0.2 | 1 | | ||
| 5.6 | 3.0 | 4.1 | 1.3 | 0 | | ||
| 6.7 | 2.5 | 5.8 | 1.8 | 2 | | ||
+--------------------+-------------------+--------------------+-------------------+-----------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Oops, something went wrong.