[FEATURE] Stats API #362

eirsep · 2023-02-23T18:42:28Z

We need a stats API that gives insights into health and analytics of plugin usage. Stats can tell us how many detector/rule creation failures, success have occurred at a node level.

petardz · 2023-03-01T13:28:27Z

One of the interesting information for user could be "progress" of detectors, for example, if monitors are keeping up with log indices ingestion doc rate. There is issue created on alerting repo for implementation of Monitor Explain API which could be used for this: opensearch-project/alerting#751

sandeshkr419 · 2023-07-24T21:09:48Z

Thinking on the possible API structure.

Scope of stats API:

Metrics related to success / failures of detectors, at cluster and node level level.
Information related to correlation engine is not shared as part of stats API since it is presently in experimental stage itself.

Path and HTTP methods

GET _plugins/_security_analytics/stats
GET _plugins/_security_analytics/stats/<metric>
GET _plugins/_security_analytics/<node-id>/stats
GET _plugins/_security_analytics/<node-id>/stats/<metric>

URL Parameters

node-id: node-id of the node for which the stats are required
metric: detectors, detectors_per_log_type, custom_rules, custom_rules_per_log_type

Response

TBD after Response Body Fields review

Response Body Fields

Cluster Level Statistics

Field Name	Description
nodes	number of total, successful, failed nodes returned in the response.
cluster_name	cluster’s name.
cluster_uuid	cluster’s uuid.
timestamp	unix epoch time of when the cluster was last refreshed.
status	The cluster’s health status.
plugin_enabled	whether security analytics plugin is enabled or not
detectors	details (enabled, defined, in_error) of detectors
detectors_per_log_type	details (enabled, defined, in_error) of detectors in each log type
enabled, defined, error	stats of detectors as part of detectors and detectors_per_log_type metric
custom_rules	number of custom rules defined
custom_rules_per_log_type	number of custom rules defined per log type

Node Level Statistics

The node level statistics will be calculated at individual node level and will be aggregated over all nodes as well for a holistic overview.

Field Name	Description
roles	node roles: cluster_manager, data, etc
shards_analyzed	shards spanned by enabled detectors
total_documents	total documents in scope of detectors
documents_processed	documents scanned by detectors
documents_behind	number of documents in a node that are yet to be processed
rules_matched	rules matched by detectors
jobs_started_on_time	detectors started on time on that node

Task Breakthrough

The plan is to get a working API with minimal information ready and then add on statistics as required.

Implement Cluster Level Statistics
Implement Node Level Statistics
[Will create a separate issue] UI / Dashboard Changes
[Will create a separate issue] Documentation changes

References

Used the below APIs to decide on structure of stats API here.

eirsep · 2023-07-25T18:42:33Z

can you post an example response?

sandeshkr419 · 2023-09-11T20:57:01Z

@eirsep Sure. After re-iterating through the request and responses, here is the updated proposal. I have limited the response objects to make it look more cleaner and avoid unnecessary information in the first implementation of stats API.

Request:

GET _plugins/_security_analytics/stats

Proposing 2 sample responses:

Sample Response 1:

GET _plugins/_security_analytics/stats

{
    "detectors": {
        "total": 5,
        "enabled": 3,
        "error": 1
    },
    "detectors_per_log_type": {
        "windows": {
           "total": 2,
           "enabled": 2,
            "error": 0
        },
        "linux": {
            "total": 2,
            "enabled": 1,
            "error": 1
            },
        "custom_log_1": {
            "total": 1,
            "enabled": 1,
            "error": 0
            },
        .
        .
        .
    },
 "custom_rules": 10,
 "custom_rules_per_log_type": {
    "windows": 5,
    "linux": 3,
    "custom_log_1": 1,
    .
    .
    .
 },
 "custom_log_types": 4
}

When there are no detectors or no custom logs defined, the above response would look like:

GET _plugins/_security_analytics/stats

{
    "detectors": {
        "total": 0,
        "enabled": 0,
        "error": 0
    },
    "custom_rules": 0,
    "custom_log_types": 0
    "detectors_per_log_type": {},
    "custom_rules_per_log_type": {},
    
}

Sample Response 2:

Considering only detectors_per_log_type and having a sub field all to signify aggregated metrics for all log types consolidated.

GET _plugins/_security_analytics/stats

{
    "detectors_per_log_type": {
        "all": {
            "total": 5,
            "enabled": 3,
            "error": 1
        },
        "windows": {
            "total": 2,
            "enabled": 2,
            "error": 0
        },
        "linux": {
            "total": 2,
            "enabled": 1,
            "error": 1
        },
        "custom_log_1": {
            "total": 1,
            "enabled": 1,
            "error": 0
        },
        .
        .
        .
    },
    "custom_rules_per_log_type": {
        "all": 10,
        "windows": 5,
        "linux": 3,
        "custom_log_1": 1,
        .
        .
        .
    },
    "custom_log_types": 4
}

When there are no detectors or no custom logs defined, the above response would look like:

GET _plugins/_security_analytics/stats

{
    "detectors_per_log_type": {
        "all": {
            "total": 0,
            "enabled": 0,
            "error": 0
        }
    },
    "custom_rules_per_log_type": {
        "all": 0
    },
    "custom_log_types": 0
}

Proposed Response

I propose Sample Response 1 over the other as it is much more cleaner implementation. The drawback with Sample Response 2 is that when iteration over different log types in the response object, one may have to purposely check and omit all type which can be confusing. Also, users who are parsing this information for metric collection and they do not need information at log type granularity can choose to omit detectors_per_log_type and custom_rules_per_log_type entirely

Future Improvements

If we require node level metrics, the same can be implemented in future with an additional parameter in request body:

 GET _plugins/_security_analytics/stats?include_advanced_metrics

The scope of this advanced metrics can be decided after the implementation of API proposed above. The idea is to keep the default API behavior light-weight as collecting the information at node level granularity will be an expensive task which will linearly scale for large clusters depending upon their node count and most users may not need those metrics for their usage.

* Rebase to push doc level changes on latest main changes (#391) * Document level alerting dev (opensearch-project#272) Signed-off-by: Ashish Agrawal <[email protected]> * Add last run context to Monitor data model Signed-off-by: Ashish Agrawal <[email protected]> * add Update Monitor function Signed-off-by: Ashish Agrawal <[email protected]> * fix integ test Signed-off-by: Ashish Agrawal <[email protected]> * Implemented draft of Finding data model, a new Input type, and some basic unit tests. (opensearch-project#260) * Implemented draft of Finding data model, and some basic unit tests for it. Signed-off-by: AWSHurneyt <[email protected]> * POC for doc-level-alerting (opensearch-project#277) Signed-off-by: Sriram <[email protected]> * Add connection to triggers for doc level alerting (opensearch-project#316) Signed-off-by: Ashish Agrawal <[email protected]> * CRUD APIs integration Tests and validation"conflict resolved" (opensearch-project#362) Signed-off-by: charliezhangaws <[email protected]> * Segregate monitor runner logic for separation of concerns (opensearch-project#363) * Refactor monitor runner logic for separation of concerns and better testability. Signed-off-by: Saurabh Singh <[email protected]> * Add action and alert flow and findings schema and additional fixes (opensearch-project#381) Signed-off-by: Ashish Agrawal <[email protected]> * Finding Search API (opensearch-project#385) * Findings search API based on Annie's work Signed-off-by: Annie Lee <[email protected]> * Fix Search API and add IT tests Signed-off-by: Ashish Agrawal <[email protected]> Co-authored-by: Annie Lee <[email protected]> * Fix integ tests and minor issues from doc level changes Signed-off-by: Ashish Agrawal <[email protected]> Co-authored-by: Annie Lee <[email protected]> Co-authored-by: Daniel Doubrovkine (dB.) <[email protected]> Co-authored-by: AWSHurneyt <[email protected]> Co-authored-by: Sriram <[email protected]> Co-authored-by: charliezhangaws <[email protected]> Co-authored-by: Saurabh Singh <[email protected]> Co-authored-by: Annie Lee <[email protected]> * Add Trigger condition resolver which parses and evaluates the Trigger expression. (opensearch-project#405) Signed-off-by: Saurabh Singh <[email protected]> * percolate query implementation in doc-level alerting (opensearch-project#399) Signed-off-by: Subhobrata Dey <[email protected]> * Finding Index rollover (opensearch-project#408) * Finding Index rollover Signed-off-by: jiahe zhang <[email protected]> * Apply fixes to make rollover work Signed-off-by: Ashish Agrawal <[email protected]> Co-authored-by: jiahe zhang <[email protected]> Co-authored-by: Annie Lee <[email protected]> Co-authored-by: Daniel Doubrovkine (dB.) <[email protected]> Co-authored-by: AWSHurneyt <[email protected]> Co-authored-by: Sriram <[email protected]> Co-authored-by: charliezhangaws <[email protected]> Co-authored-by: Saurabh Singh <[email protected]> Co-authored-by: Annie Lee <[email protected]> Co-authored-by: Saurabh Singh <[email protected]> Co-authored-by: Subhobrata Dey <[email protected]>

eirsep added enhancement New feature or request untriaged and removed untriaged labels Feb 23, 2023

eirsep self-assigned this Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Stats API #362

[FEATURE] Stats API #362

eirsep commented Feb 23, 2023 •

edited

Loading

petardz commented Mar 1, 2023

sandeshkr419 commented Jul 24, 2023 •

edited

Loading

eirsep commented Jul 25, 2023

sandeshkr419 commented Sep 11, 2023

[FEATURE] Stats API #362

[FEATURE] Stats API #362

Comments

eirsep commented Feb 23, 2023 • edited Loading

petardz commented Mar 1, 2023

sandeshkr419 commented Jul 24, 2023 • edited Loading

Scope of stats API:

Path and HTTP methods

URL Parameters

Response

Response Body Fields

Cluster Level Statistics

Node Level Statistics

Task Breakthrough

References

eirsep commented Jul 25, 2023

sandeshkr419 commented Sep 11, 2023

Request:

Sample Response 1:

Sample Response 2:

Proposed Response

Future Improvements

eirsep commented Feb 23, 2023 •

edited

Loading

sandeshkr419 commented Jul 24, 2023 •

edited

Loading