Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Stats API #362

Open
eirsep opened this issue Feb 23, 2023 · 4 comments
Open

[FEATURE] Stats API #362

eirsep opened this issue Feb 23, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@eirsep
Copy link
Member

eirsep commented Feb 23, 2023

We need a stats API that gives insights into health and analytics of plugin usage. Stats can tell us how many detector/rule creation failures, success have occurred at a node level.

@eirsep eirsep added enhancement New feature or request untriaged and removed untriaged labels Feb 23, 2023
@eirsep eirsep self-assigned this Feb 23, 2023
@petardz
Copy link
Contributor

petardz commented Mar 1, 2023

One of the interesting information for user could be "progress" of detectors, for example, if monitors are keeping up with log indices ingestion doc rate. There is issue created on alerting repo for implementation of Monitor Explain API which could be used for this: opensearch-project/alerting#751

@sandeshkr419
Copy link
Contributor

sandeshkr419 commented Jul 24, 2023

Thinking on the possible API structure.

Scope of stats API:

  • Metrics related to success / failures of detectors, at cluster and node level level.
  • Information related to correlation engine is not shared as part of stats API since it is presently in experimental stage itself.

Path and HTTP methods

GET _plugins/_security_analytics/stats
GET _plugins/_security_analytics/stats/<metric>
GET _plugins/_security_analytics/<node-id>/stats
GET _plugins/_security_analytics/<node-id>/stats/<metric>

URL Parameters

node-id: node-id of the node for which the stats are required
metric: detectors, detectors_per_log_type, custom_rules, custom_rules_per_log_type

Response

TBD after Response Body Fields review

Response Body Fields

Cluster Level Statistics

Field Name Description
nodes number of total, successful, failed nodes returned in the response.
cluster_name cluster’s name.
cluster_uuid cluster’s uuid.
timestamp unix epoch time of when the cluster was last refreshed.
status The cluster’s health status.
plugin_enabled whether security analytics plugin is enabled or not
detectors details (enabled, defined, in_error) of detectors
detectors_per_log_type details (enabled, defined, in_error) of detectors in each log type
enabled, defined, error stats of detectors as part of detectors and detectors_per_log_type metric
custom_rules number of custom rules defined
custom_rules_per_log_type number of custom rules defined per log type

Node Level Statistics

The node level statistics will be calculated at individual node level and will be aggregated over all nodes as well for a holistic overview.

Field Name Description
roles node roles: cluster_manager, data, etc
shards_analyzed shards spanned by enabled detectors
total_documents total documents in scope of detectors
documents_processed documents scanned by detectors
documents_behind number of documents in a node that are yet to be processed
rules_matched rules matched by detectors
jobs_started_on_time detectors started on time on that node

Task Breakthrough

The plan is to get a working API with minimal information ready and then add on statistics as required.

  1. Implement Cluster Level Statistics
  2. Implement Node Level Statistics
  3. [Will create a separate issue] UI / Dashboard Changes
  4. [Will create a separate issue] Documentation changes

References

Used the below APIs to decide on structure of stats API here.

@eirsep
Copy link
Member Author

eirsep commented Jul 25, 2023

can you post an example response?

@sandeshkr419
Copy link
Contributor

@eirsep Sure. After re-iterating through the request and responses, here is the updated proposal. I have limited the response objects to make it look more cleaner and avoid unnecessary information in the first implementation of stats API.

Request:

GET _plugins/_security_analytics/stats

Proposing 2 sample responses:

Sample Response 1:

GET _plugins/_security_analytics/stats

{
    "detectors": {
        "total": 5,
        "enabled": 3,
        "error": 1
    },
    "detectors_per_log_type": {
        "windows": {
           "total": 2,
           "enabled": 2,
            "error": 0
        },
        "linux": {
            "total": 2,
            "enabled": 1,
            "error": 1
            },
        "custom_log_1": {
            "total": 1,
            "enabled": 1,
            "error": 0
            },
        .
        .
        .
    },
 "custom_rules": 10,
 "custom_rules_per_log_type": {
    "windows": 5,
    "linux": 3,
    "custom_log_1": 1,
    .
    .
    .
 },
 "custom_log_types": 4
}

When there are no detectors or no custom logs defined, the above response would look like:

GET _plugins/_security_analytics/stats

{
    "detectors": {
        "total": 0,
        "enabled": 0,
        "error": 0
    },
    "custom_rules": 0,
    "custom_log_types": 0
    "detectors_per_log_type": {},
    "custom_rules_per_log_type": {},
    
}

Sample Response 2:

Considering only detectors_per_log_type and having a sub field all to signify aggregated metrics for all log types consolidated.

GET _plugins/_security_analytics/stats

{
    "detectors_per_log_type": {
        "all": {
            "total": 5,
            "enabled": 3,
            "error": 1
        },
        "windows": {
            "total": 2,
            "enabled": 2,
            "error": 0
        },
        "linux": {
            "total": 2,
            "enabled": 1,
            "error": 1
        },
        "custom_log_1": {
            "total": 1,
            "enabled": 1,
            "error": 0
        },
        .
        .
        .
    },
    "custom_rules_per_log_type": {
        "all": 10,
        "windows": 5,
        "linux": 3,
        "custom_log_1": 1,
        .
        .
        .
    },
    "custom_log_types": 4
}

When there are no detectors or no custom logs defined, the above response would look like:

GET _plugins/_security_analytics/stats

{
    "detectors_per_log_type": {
        "all": {
            "total": 0,
            "enabled": 0,
            "error": 0
        }
    },
    "custom_rules_per_log_type": {
        "all": 0
    },
    "custom_log_types": 0
}

Proposed Response

I propose Sample Response 1 over the other as it is much more cleaner implementation. The drawback with Sample Response 2 is that when iteration over different log types in the response object, one may have to purposely check and omit all type which can be confusing. Also, users who are parsing this information for metric collection and they do not need information at log type granularity can choose to omit detectors_per_log_type and custom_rules_per_log_type entirely

Future Improvements

If we require node level metrics, the same can be implemented in future with an additional parameter in request body:

 GET _plugins/_security_analytics/stats?include_advanced_metrics  

The scope of this advanced metrics can be decided after the implementation of API proposed above. The idea is to keep the default API behavior light-weight as collecting the information at node level granularity will be an expensive task which will linearly scale for large clusters depending upon their node count and most users may not need those metrics for their usage.

riysaxen-amzn pushed a commit to riysaxen-amzn/security-analytics that referenced this issue Mar 25, 2024
* Rebase to push doc level changes on latest main changes (#391)

* Document level alerting dev (opensearch-project#272)

Signed-off-by: Ashish Agrawal <[email protected]>

* Add last run context to Monitor data model

Signed-off-by: Ashish Agrawal <[email protected]>

* add Update Monitor function

Signed-off-by: Ashish Agrawal <[email protected]>

* fix integ test

Signed-off-by: Ashish Agrawal <[email protected]>

* Implemented draft of Finding data model, a new Input type, and some basic unit tests. (opensearch-project#260)

* Implemented draft of Finding data model, and some basic unit tests for it.

Signed-off-by: AWSHurneyt <[email protected]>

* POC for doc-level-alerting (opensearch-project#277)

Signed-off-by: Sriram <[email protected]>

* Add connection to triggers for doc level alerting (opensearch-project#316)

Signed-off-by: Ashish Agrawal <[email protected]>

* CRUD APIs integration Tests and validation"conflict resolved" (opensearch-project#362)

Signed-off-by: charliezhangaws <[email protected]>

* Segregate monitor runner logic for separation of concerns (opensearch-project#363)

* Refactor monitor runner logic for separation of concerns and better testability.

Signed-off-by: Saurabh Singh <[email protected]>

* Add action and alert flow and findings schema and additional fixes (opensearch-project#381)

Signed-off-by: Ashish Agrawal <[email protected]>

* Finding Search API (opensearch-project#385)

* Findings search API based on Annie's work

Signed-off-by: Annie Lee <[email protected]>

* Fix Search API and add IT tests

Signed-off-by: Ashish Agrawal <[email protected]>

Co-authored-by: Annie Lee <[email protected]>

* Fix integ tests and minor issues from doc level changes

Signed-off-by: Ashish Agrawal <[email protected]>

Co-authored-by: Annie Lee <[email protected]>
Co-authored-by: Daniel Doubrovkine (dB.) <[email protected]>
Co-authored-by: AWSHurneyt <[email protected]>
Co-authored-by: Sriram <[email protected]>
Co-authored-by: charliezhangaws <[email protected]>
Co-authored-by: Saurabh Singh <[email protected]>
Co-authored-by: Annie Lee <[email protected]>

* Add Trigger condition resolver which parses and evaluates the Trigger expression. (opensearch-project#405)

Signed-off-by: Saurabh Singh <[email protected]>

* percolate query implementation in doc-level alerting (opensearch-project#399)

Signed-off-by: Subhobrata Dey <[email protected]>

* Finding Index rollover (opensearch-project#408)

* Finding Index rollover

Signed-off-by: jiahe zhang <[email protected]>

* Apply fixes to make rollover work

Signed-off-by: Ashish Agrawal <[email protected]>

Co-authored-by: jiahe zhang <[email protected]>

Co-authored-by: Annie Lee <[email protected]>
Co-authored-by: Daniel Doubrovkine (dB.) <[email protected]>
Co-authored-by: AWSHurneyt <[email protected]>
Co-authored-by: Sriram <[email protected]>
Co-authored-by: charliezhangaws <[email protected]>
Co-authored-by: Saurabh Singh <[email protected]>
Co-authored-by: Annie Lee <[email protected]>
Co-authored-by: Saurabh Singh <[email protected]>
Co-authored-by: Subhobrata Dey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants