Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress remaining O11y rule types to FAAD #169867

Closed
16 tasks done
mikecote opened this issue Oct 25, 2023 · 11 comments · Fixed by #191127
Closed
16 tasks done

Progress remaining O11y rule types to FAAD #169867

mikecote opened this issue Oct 25, 2023 · 11 comments · Fixed by #191127
Assignees
Labels
Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types Meta Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@mikecote
Copy link
Contributor

mikecote commented Oct 25, 2023

Now that we've successfully onboarded our first O11y rule type to use FAAD (#164220) we should start onboarding the remaining rule types as well.

The list of rule types include:

  • Custom threshold (Ersin)

APM

  • Latency threshold (Alexi)
  • Anomaly (Alexi)
  • Error count threshold (Alexi)
  • Failed transaction rate threshold (Alexi)

Infra

  • Inventory (Ersin)

Logs

  • Log threshold (Alexi)

SLO

  • SLO burn rate (Ersin)

Uptime

  • Uptime monitor status (Ersin)
  • Uptime TLS (Ersin)
  • Uptime Duration Anomaly (Ersin)
  • Synthetics monitor status (Alexi)
  • Synthetics TLS certificate (Alexi)

Definition of Done

  • Rule types use the new alerting APIs to report alerts
  • Rule types no longer wrap their executor code with rule registry wrapper
  • Rule types no longer interact with the rule registry
@mikecote mikecote added Meta Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types labels Oct 25, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@mikecote
Copy link
Contributor Author

This issue isn't prioritized for 8.12 so I added an 8.13 label to have it as a candidate. We can backlog the issue for now.

@ymao1 ymao1 moved this from Awaiting Triage to Todo in AppEx: ResponseOps - Execution & Connectors Oct 26, 2023
@mikecote
Copy link
Contributor Author

mikecote commented Nov 16, 2023

cc @maryam-saeidi we're using this issue to track the remaining O11y rules that need to onboard framework alerts-as-data APIs. It's not likely that we'll have this prioritized in 8.13 but we're more than happy to let someone else drive this (or part of) with our help.

@shanisagiv1
Copy link

cc: @vinaychandrasekhar , per our recent discussion about AAD

@maryam-saeidi
Copy link
Member

@mikecote Thanks for pinging me here; my point in the meeting was a suggestion about the possible meaning of that item. I'll ping @paulb-elastic regarding prioritization.

@mikecote
Copy link
Contributor Author

Sounds good, no specific prioritization ask from us at this time, so it's ok if you don't have capacity 👍 but if you want to pick up the issue, we're more than happy to help!

@heespi heespi changed the title Onboard remaining O11y rule types to FAAD Progress remaining O11y rule types to FAAD Feb 13, 2024
@mikecote
Copy link
Contributor Author

@maryam-saeidi fyi, we plan to make some progress in this area in 8.14.

@ersin-erdal
Copy link
Contributor

ersin-erdal commented Mar 18, 2024

Should we add Custom threshold to the list as well?
@mikecote

@mikecote
Copy link
Contributor Author

@ersin-erdal yes good catch, please add it to the description 🙏

@jasonrhodes
Copy link
Member

@mikecote can you point us to docs / etc where folks can read about what FAAD is? That stands for "Framework Alerts-as-Data", is that right? Thanks!

@mikecote
Copy link
Contributor Author

@mikecote can you point us to docs / etc where folks can read about what FAAD is? That stands for "Framework Alerts-as-Data", is that right? Thanks!

@jasonrhodes You can read more about Framework Alerts-as-Data here: https://github.com/elastic/response-ops-team/issues/95 with the various phases we are unifying the architecture to have a the framework provide everything.

ersin-erdal added a commit that referenced this issue Mar 22, 2024
Towards: #169867

This PR onboards Inventory Metric Threshold rule type with FAAD.

## To verify.

I used [data-generator](https://github.com/ersin-erdal/data-generator)
to generate metric data.

Then created an Inventory Threshold rule with actions (alert and
recovered),
conitions: `For Hosts, When CPU usage is above 10`.
Inventory Threshold uses the following formula to calculate the result:

(`system.cpu.user.pct` + `system.cpu.system.pct`) / `system.cpu.cores`

Set 
`system.cpu.user.pct` = 1
`system.cpu.system.pct` = 1
`system.cpu.cores` = 4
in the
[cpu-001](https://github.com/ersin-erdal/data-generator/blob/main/src/indexers/metrics/docs/cpu-001.json).
This makes the CPU usage 0.5 (50%) for the `host-1`

and run the generator with `./generate metrics`

Your rule should create an alert and should saved it in
`.internal.alerts-observability.metrics.alerts-default-000001`

Then set  `system.cpu.user.pct`=0 and `system.cpu.system.pct`=0.

The alert should be recovered and the AAD in the above index should be
updated `kibana.alert.status: recovered`.
doakalexi added a commit that referenced this issue Mar 25, 2024
Towards: #169867

This PR onboards Log Threshold rule type with FAAD.

### To verify

Create a log threshold rule.
Example:
```
POST kbn:/api/alerting/rule
{
  "params": {
    "logView": {
      "logViewId": "Default",
      "type": "log-view-reference"
    },
    "timeSize": 5,
    "timeUnit": "m",
    "count": {
      "value": -1,
      "comparator": "more than"
    },
    "criteria": [
      {
        "field": "log.level",
        "comparator": "equals",
        "value": "error"
      }
    ]
  },
  "consumer": "alerts",
  "schedule": {
    "interval": "1m"
  },
  "tags": [],
  "name": "test",
  "rule_type_id": "logs.alert.document.count",
  "notify_when": "onActionGroupChange",
  "actions": []
}
```
Your rule should create an alert and should saved it in
`.internal.alerts-observability.metrics.alerts-default-000001`
Example:
```
GET .internal.alerts-*/_search
```
Then set `count.value: 75`

The alert should be recovered and the AAD in the above index should be
updated `kibana.alert.status: recovered`.
ersin-erdal added a commit that referenced this issue Mar 26, 2024
Towards: #169867

This PR onboards "SLO burn rate" rule type with FAAD.

## To verify

Create an SLO by using a test index (create a dataview for it), use very
low `budget consumed %`
The rule bound to the SLO should create an alert and save it under
`.internal.alerts-observability.slo.alerts-default-000001`
ersin-erdal added a commit that referenced this issue Mar 26, 2024
Towards: #169867

This PR onboards "Custom Threshold" rule type with FAAD.

## To verify
Create a Custom Threshold rule by using a test index and DW. Set the
`Role visibility` `metrics`.
When the rule runs, it generates an alert and saves it under
`.internal.alerts-observability.threshold.alerts-default`.
The alert should be visible on `Observability > alerts` page as well.

---------

Co-authored-by: kibanamachine <[email protected]>
doakalexi added a commit that referenced this issue Mar 26, 2024
Towards: #169867

This PR onboards Latency Threshold rule type with FAAD.

### To verify

1. Run the following script to generate APM data:
```
node scripts/synthtrace simple_trace.ts --local --live
```

2. Create a latency threshold rule.
Example:
```
POST kbn:/api/alerting/rule
{
  "params": {
    "aggregationType": "avg",
    "environment": "ENVIRONMENT_ALL",
    "threshold": 400,
    "windowSize": 5,
    "windowUnit": "m"
  },
  "consumer": "alerts",
  "schedule": {
    "interval": "1m"
  },
  "tags": [],
  "name": "testinggg",
  "rule_type_id": "apm.transaction_duration",
  "notify_when": "onActionGroupChange",
  "actions": []
}
```
3. Your rule should create an alert and should saved it in
`.internal.alerts-observability.apm.alerts-default-000001`
Example:
```
GET .internal.alerts-*/_search
```
4. Set `threshold: 10000`

5. The alert should be recovered and the AAD in the above index should
be updated `kibana.alert.status: recovered`.
doakalexi added a commit that referenced this issue Mar 27, 2024
Towards: #169867

This PR onboards the Error Count Threshold rule type with FAAD.

### To verify

1. Run the following script to generate APM data:
```
node scripts/synthtrace many_errors.ts --local --live
```

2. Create an error count threshold rule.
Example:
```
POST kbn:/api/alerting/rule
{
  "params": {
    "threshold": 25,
    "windowSize": 5,
    "windowUnit": "m",
    "environment": "ENVIRONMENT_ALL"
  },
  "consumer": "alerts",
  "schedule": {
    "interval": "1m"
  },
  "tags": [],
  "name": "testinggg",
  "rule_type_id": "apm.error_rate",
  "notify_when": "onActionGroupChange",
  "actions": []
}
```
3. Your rule should create an alert and should saved it in
`.internal.alerts-observability.apm.alerts-default-000001`
Example:
```
GET .internal.alerts-*/_search
```
4. Recover the alert by setting `threshold: 10000`

5. The alert should be recovered and the AAD in the above index should
be updated `kibana.alert.status: recovered`.
doakalexi added a commit that referenced this issue Apr 2, 2024
towards: #169867

This PR onboards APM Anomaly rule type with FAAD.

I am having trouble getting this rule to create an alert. If there is
any easy way to verify pls let me know!
doakalexi added a commit that referenced this issue Apr 18, 2024
Towards: #169867

This PR onboards the Transaction Error Rate rule type with FAAD.

### To verify

1. Run the following script to generate APM data:
```
node scripts/synthtrace many_errors.ts --local --live
```

2. Create a transaction error rate rule.
Example:
```
POST kbn:/api/alerting/rule
{
  "params": {
    "threshold": 0,
    "windowSize": 5,
    "windowUnit": "m",
    "environment": "ENVIRONMENT_ALL"
  },
  "consumer": "alerts",
  "schedule": {
    "interval": "1m"
  },
  "tags": [],
  "name": "test",
  "rule_type_id": "apm.transaction_error_rate",
  "notify_when": "onActionGroupChange",
  "actions": []
}
```
3. Your rule should create an alert and should saved it in
`.internal.alerts-observability.apm.alerts-default-000001`
Example:
```
GET .internal.alerts-*/_search
```
4. Recover the alert by setting `threshold: 200`

5. The alert should be recovered and the AAD in the above index should
be updated `kibana.alert.status: recovered`.
ersin-erdal added a commit that referenced this issue May 2, 2024
Towards: #169867

This PR onboards Uptime rule types (Tls, Duration Anolamy and Monitor
status) with FAAD.

We are deprecating the rule-registry plugin and onboard the rule types
with the new alertsClient to manage alert-as-data.
There is no new future, all the rule types should work as they were, and
save alerts with all the existing fields.

## To verify:

- Switch to Kibana 8.9.0 in your local repo. (In this version Uptime
rules are not deprecated)
- Run your ES with: `yarn es snapshot -E path.data=../local-es-data`
- Run your Kibana
- Create Uptime rules with an active and a recovered action (You can run
Heartbeat locally if needed, [follow the
instructions](https://www.elastic.co/guide/en/beats/heartbeat/current/heartbeat-installation-configuration.html))
- Stop your ES and Kibana
- Switch to this branch and run your ES with `yarn es snapshot -E
path.data=../local-es-data` again.
- Run your Kibana
- Modify Uptime rulesType codes to force them to create an alert.
Example:
Mock [availabilityResults in
status_check](https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/uptime/server/legacy_uptime/lib/alerts/status_check.ts#L491)
with below data
```
availabilityResults = [
      {
        monitorId: '1',
        up: 1,
        down: 0,
        location: 'location',
        availabilityRatio: 0.5,
        monitorInfo: {
          timestamp: '',
          monitor: {
            id: '1',
            status: 'down',
            type: 'type',
            check_group: 'default',
          },
          docId: 'docid',
        },
      },
    ];
```

It should create an alert. The alert should be saved under
`.alerts-observability.uptime.alerts` index and be visible under
observability alerts page.

Then remove the mock, the alert should be recovered.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue May 2, 2024
Towards: elastic#169867

This PR onboards Uptime rule types (Tls, Duration Anolamy and Monitor
status) with FAAD.

We are deprecating the rule-registry plugin and onboard the rule types
with the new alertsClient to manage alert-as-data.
There is no new future, all the rule types should work as they were, and
save alerts with all the existing fields.

## To verify:

- Switch to Kibana 8.9.0 in your local repo. (In this version Uptime
rules are not deprecated)
- Run your ES with: `yarn es snapshot -E path.data=../local-es-data`
- Run your Kibana
- Create Uptime rules with an active and a recovered action (You can run
Heartbeat locally if needed, [follow the
instructions](https://www.elastic.co/guide/en/beats/heartbeat/current/heartbeat-installation-configuration.html))
- Stop your ES and Kibana
- Switch to this branch and run your ES with `yarn es snapshot -E
path.data=../local-es-data` again.
- Run your Kibana
- Modify Uptime rulesType codes to force them to create an alert.
Example:
Mock [availabilityResults in
status_check](https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/uptime/server/legacy_uptime/lib/alerts/status_check.ts#L491)
with below data
```
availabilityResults = [
      {
        monitorId: '1',
        up: 1,
        down: 0,
        location: 'location',
        availabilityRatio: 0.5,
        monitorInfo: {
          timestamp: '',
          monitor: {
            id: '1',
            status: 'down',
            type: 'type',
            check_group: 'default',
          },
          docId: 'docid',
        },
      },
    ];
```

It should create an alert. The alert should be saved under
`.alerts-observability.uptime.alerts` index and be visible under
observability alerts page.

Then remove the mock, the alert should be recovered.

(cherry picked from commit d228f48)
@mikecote mikecote moved this from In Progress to Todo in AppEx: ResponseOps - Execution & Connectors May 2, 2024
kibanamachine referenced this issue May 2, 2024
# Backport

This will backport the following commits from `main` to `8.14`:
- [Onboard Uptime rule types with FAAD
(#179493)](#179493)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ersin
Erdal","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-05-02T15:53:56Z","message":"Onboard
Uptime rule types with FAAD (#179493)\n\nTowards:
https://github.com/elastic/kibana/issues/169867\r\n\r\nThis PR onboards
Uptime rule types (Tls, Duration Anolamy and Monitor\r\nstatus) with
FAAD.\r\n\r\nWe are deprecating the rule-registry plugin and onboard the
rule types\r\nwith the new alertsClient to manage
alert-as-data.\r\nThere is no new future, all the rule types should work
as they were, and\r\nsave alerts with all the existing fields.\r\n\r\n##
To verify:\r\n\r\n- Switch to Kibana 8.9.0 in your local repo. (In this
version Uptime\r\nrules are not deprecated)\r\n- Run your ES with: `yarn
es snapshot -E path.data=../local-es-data`\r\n- Run your Kibana\r\n-
Create Uptime rules with an active and a recovered action (You can
run\r\nHeartbeat locally if needed, [follow
the\r\ninstructions](https://www.elastic.co/guide/en/beats/heartbeat/current/heartbeat-installation-configuration.html))\r\n-
Stop your ES and Kibana\r\n- Switch to this branch and run your ES with
`yarn es snapshot -E\r\npath.data=../local-es-data` again.\r\n- Run your
Kibana\r\n- Modify Uptime rulesType codes to force them to create an
alert.\r\nExample:\r\nMock [availabilityResults
in\r\nstatus_check](https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/uptime/server/legacy_uptime/lib/alerts/status_check.ts#L491)\r\nwith
below data\r\n```\r\navailabilityResults = [\r\n {\r\n monitorId:
'1',\r\n up: 1,\r\n down: 0,\r\n location: 'location',\r\n
availabilityRatio: 0.5,\r\n monitorInfo: {\r\n timestamp: '',\r\n
monitor: {\r\n id: '1',\r\n status: 'down',\r\n type: 'type',\r\n
check_group: 'default',\r\n },\r\n docId: 'docid',\r\n },\r\n },\r\n
];\r\n```\r\n\r\nIt should create an alert. The alert should be saved
under\r\n`.alerts-observability.uptime.alerts` index and be visible
under\r\nobservability alerts page.\r\n\r\nThen remove the mock, the
alert should be
recovered.","sha":"d228f488ec0456c96b3e06aee57a0d28af851eb4","branchLabelMapping":{"^v8.15.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:ResponseOps","ci:project-deploy-observability","apm:review","v8.14.0","v8.15.0"],"title":"Onboard
Uptime rule types with
FAAD","number":179493,"url":"https://github.com/elastic/kibana/pull/179493","mergeCommit":{"message":"Onboard
Uptime rule types with FAAD (#179493)\n\nTowards:
https://github.com/elastic/kibana/issues/169867\r\n\r\nThis PR onboards
Uptime rule types (Tls, Duration Anolamy and Monitor\r\nstatus) with
FAAD.\r\n\r\nWe are deprecating the rule-registry plugin and onboard the
rule types\r\nwith the new alertsClient to manage
alert-as-data.\r\nThere is no new future, all the rule types should work
as they were, and\r\nsave alerts with all the existing fields.\r\n\r\n##
To verify:\r\n\r\n- Switch to Kibana 8.9.0 in your local repo. (In this
version Uptime\r\nrules are not deprecated)\r\n- Run your ES with: `yarn
es snapshot -E path.data=../local-es-data`\r\n- Run your Kibana\r\n-
Create Uptime rules with an active and a recovered action (You can
run\r\nHeartbeat locally if needed, [follow
the\r\ninstructions](https://www.elastic.co/guide/en/beats/heartbeat/current/heartbeat-installation-configuration.html))\r\n-
Stop your ES and Kibana\r\n- Switch to this branch and run your ES with
`yarn es snapshot -E\r\npath.data=../local-es-data` again.\r\n- Run your
Kibana\r\n- Modify Uptime rulesType codes to force them to create an
alert.\r\nExample:\r\nMock [availabilityResults
in\r\nstatus_check](https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/uptime/server/legacy_uptime/lib/alerts/status_check.ts#L491)\r\nwith
below data\r\n```\r\navailabilityResults = [\r\n {\r\n monitorId:
'1',\r\n up: 1,\r\n down: 0,\r\n location: 'location',\r\n
availabilityRatio: 0.5,\r\n monitorInfo: {\r\n timestamp: '',\r\n
monitor: {\r\n id: '1',\r\n status: 'down',\r\n type: 'type',\r\n
check_group: 'default',\r\n },\r\n docId: 'docid',\r\n },\r\n },\r\n
];\r\n```\r\n\r\nIt should create an alert. The alert should be saved
under\r\n`.alerts-observability.uptime.alerts` index and be visible
under\r\nobservability alerts page.\r\n\r\nThen remove the mock, the
alert should be
recovered.","sha":"d228f488ec0456c96b3e06aee57a0d28af851eb4"}},"sourceBranch":"main","suggestedTargetBranches":["8.14"],"targetPullRequestStates":[{"branch":"8.14","label":"v8.14.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.15.0","branchLabelMappingKey":"^v8.15.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/179493","number":179493,"mergeCommit":{"message":"Onboard
Uptime rule types with FAAD (#179493)\n\nTowards:
https://github.com/elastic/kibana/issues/169867\r\n\r\nThis PR onboards
Uptime rule types (Tls, Duration Anolamy and Monitor\r\nstatus) with
FAAD.\r\n\r\nWe are deprecating the rule-registry plugin and onboard the
rule types\r\nwith the new alertsClient to manage
alert-as-data.\r\nThere is no new future, all the rule types should work
as they were, and\r\nsave alerts with all the existing fields.\r\n\r\n##
To verify:\r\n\r\n- Switch to Kibana 8.9.0 in your local repo. (In this
version Uptime\r\nrules are not deprecated)\r\n- Run your ES with: `yarn
es snapshot -E path.data=../local-es-data`\r\n- Run your Kibana\r\n-
Create Uptime rules with an active and a recovered action (You can
run\r\nHeartbeat locally if needed, [follow
the\r\ninstructions](https://www.elastic.co/guide/en/beats/heartbeat/current/heartbeat-installation-configuration.html))\r\n-
Stop your ES and Kibana\r\n- Switch to this branch and run your ES with
`yarn es snapshot -E\r\npath.data=../local-es-data` again.\r\n- Run your
Kibana\r\n- Modify Uptime rulesType codes to force them to create an
alert.\r\nExample:\r\nMock [availabilityResults
in\r\nstatus_check](https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/uptime/server/legacy_uptime/lib/alerts/status_check.ts#L491)\r\nwith
below data\r\n```\r\navailabilityResults = [\r\n {\r\n monitorId:
'1',\r\n up: 1,\r\n down: 0,\r\n location: 'location',\r\n
availabilityRatio: 0.5,\r\n monitorInfo: {\r\n timestamp: '',\r\n
monitor: {\r\n id: '1',\r\n status: 'down',\r\n type: 'type',\r\n
check_group: 'default',\r\n },\r\n docId: 'docid',\r\n },\r\n },\r\n
];\r\n```\r\n\r\nIt should create an alert. The alert should be saved
under\r\n`.alerts-observability.uptime.alerts` index and be visible
under\r\nobservability alerts page.\r\n\r\nThen remove the mock, the
alert should be
recovered.","sha":"d228f488ec0456c96b3e06aee57a0d28af851eb4"}}]}]
BACKPORT-->

Co-authored-by: Ersin Erdal <[email protected]>
doakalexi added a commit that referenced this issue Jun 20, 2024
Towards: #169867

This PR onboards the Synthetics Monitor Status rule type with FAAD.

### To verify
I can't get the rule to alert, so I modified the status check to report
the monitor as down. If you know of an easier way pls let me know 🙂

1. Create a [monitor](http://localhost:5601/app/synthetics/monitors), by
default creating a monitor creates a rule.
2. Click on the monitor and grab the id and locationId from the url
3. Go to [the status check
code](https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/synthetics/server/queries/query_monitor_status.ts#L208)
and replace the object that is returned with the following using the id
and locationId you got from the monitor.
```
{
    up: 0,
    down: 1,
    pending: 0,
    upConfigs: {},
    pendingConfigs: {},
    downConfigs: {
      '${id}-${locationId}': {
        configId: '${id}',
        monitorQueryId: '${id}',
        status: 'down',
        locationId: '${locationId}',
        ping: {
          '@timestamp': new Date().toISOString(),
          state: {
            id: 'test-state',
          },
          monitor: {
            name: 'test-monitor',
          },
          observer: {
            name: 'test-monitor',
          },
        } as any,
        timestamp: new Date().toISOString(),
      },
    },
    enabledMonitorQueryIds: ['${id}'],
  };
```
5. Your rule should create an alert and should saved it in
`.internal.alerts-observability.uptime.alerts-default-000001`
Example:
```
GET .internal.alerts-*/_search
```
6. Recover repeating step 3 using
```
{
    up: 1,
    down: 0,
    pending: 0,
    downConfigs: {},
    pendingConfigs: {},
    upConfigs: {
      '${id}-${locationId}': {
        configId: '${id}',
        monitorQueryId: '${id}',
        status: 'down',
        locationId: '${locationId}',
        ping: {
          '@timestamp': new Date().toISOString(),
          state: {
            id: 'test-state',
          },
          monitor: {
            name: 'test-monitor',
          },
          observer: {
            name: 'test-monitor',
          },
        } as any,
        timestamp: new Date().toISOString(),
      },
    },
    enabledMonitorQueryIds: ['${id}'],
  };
```
8. The alert should be recovered and the AAD in the above index should
be updated `kibana.alert.status: recovered`.
bhapas pushed a commit to bhapas/kibana that referenced this issue Jun 24, 2024
Towards: elastic#169867

This PR onboards the Synthetics Monitor Status rule type with FAAD.

### To verify
I can't get the rule to alert, so I modified the status check to report
the monitor as down. If you know of an easier way pls let me know 🙂

1. Create a [monitor](http://localhost:5601/app/synthetics/monitors), by
default creating a monitor creates a rule.
2. Click on the monitor and grab the id and locationId from the url
3. Go to [the status check
code](https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/synthetics/server/queries/query_monitor_status.ts#L208)
and replace the object that is returned with the following using the id
and locationId you got from the monitor.
```
{
    up: 0,
    down: 1,
    pending: 0,
    upConfigs: {},
    pendingConfigs: {},
    downConfigs: {
      '${id}-${locationId}': {
        configId: '${id}',
        monitorQueryId: '${id}',
        status: 'down',
        locationId: '${locationId}',
        ping: {
          '@timestamp': new Date().toISOString(),
          state: {
            id: 'test-state',
          },
          monitor: {
            name: 'test-monitor',
          },
          observer: {
            name: 'test-monitor',
          },
        } as any,
        timestamp: new Date().toISOString(),
      },
    },
    enabledMonitorQueryIds: ['${id}'],
  };
```
5. Your rule should create an alert and should saved it in
`.internal.alerts-observability.uptime.alerts-default-000001`
Example:
```
GET .internal.alerts-*/_search
```
6. Recover repeating step 3 using
```
{
    up: 1,
    down: 0,
    pending: 0,
    downConfigs: {},
    pendingConfigs: {},
    upConfigs: {
      '${id}-${locationId}': {
        configId: '${id}',
        monitorQueryId: '${id}',
        status: 'down',
        locationId: '${locationId}',
        ping: {
          '@timestamp': new Date().toISOString(),
          state: {
            id: 'test-state',
          },
          monitor: {
            name: 'test-monitor',
          },
          observer: {
            name: 'test-monitor',
          },
        } as any,
        timestamp: new Date().toISOString(),
      },
    },
    enabledMonitorQueryIds: ['${id}'],
  };
```
8. The alert should be recovered and the AAD in the above index should
be updated `kibana.alert.status: recovered`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types Meta Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
7 participants