Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uptime] TLS alerting #63913

Merged
merged 57 commits into from
May 5, 2020
Merged

Conversation

justinkambic
Copy link
Contributor

@justinkambic justinkambic commented Apr 17, 2020

Summary

The goal here is to create simple alerts that allow users to monitor their TLS certificate status, and alert them when any of the certs they maintain are either close to expiring or have lived beyond a certain age threshold.

Resolves elastic/uptime#162.

WIP, creating TLS alerts for Uptime

Testing this PR

  1. You'll need to configure your local Kibana/ES/Beats to use SSL connection, or else disable security.
  2. Define an alert.
  3. Adjust the app's dynamic settings cert thresholds to observe that the alert triggers appropriately.
  4. Test the various state fields to ensure that they correspond to criteria you define when creating your alert.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@justinkambic justinkambic added release_note:enhancement WIP Work in progress v8.0.0 Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability v7.8.0 labels Apr 17, 2020
@justinkambic justinkambic self-assigned this Apr 17, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@andrewvc
Copy link
Contributor

andrewvc commented May 4, 2020

One functional thing I immediately noticed. The design spec calls for a new alerts drop down menu on the certificates page. In this PR the alert creation action is added to the overview page.

@andrewvc
Copy link
Contributor

andrewvc commented May 4, 2020

Running with the demo monitor list and a server log action I get the following error:

server    log   [10:52:55.039] [error][alerting][alerting][plugins][plugins] Executing Alert "939e80fa-5a85-46f4-87c7-2cc56ad3c9d0" has resulted in Error: [query_shard_exception] No mapping found for [undefined] in order to sort on, with { index_uuid="22ddL3ohQ8id-tr7N1r_YA" & index="heartbeat-8.0.0-2020.05.04-000001" }

@justinkambic
Copy link
Contributor Author

Running with the demo monitor list and a server log action I get the following error:

This was a defect I injected while implementing other feedback, will notify when I push the fix.

`);
});

it('handles negative diff values appropriately for aging certs', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test for completeness

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@justinkambic
Copy link
Contributor Author

@andrewvc might want to have another look to verify you're alright with the changes in d1b414a and 86add5b.

Essentially these allow the alert flyout toggle dropdown to handle a list of alert types. If there is > 1 item, it will nest them like on the Overview page, otherwise it'll display a single context panel. And I added the toggle to the certs page.

>
<FormattedMessage
id="xpack.uptime.alerts.tls.settingsPageNav.text"
defaultMessage="You can edit these thresholds on the {settingsPageLink}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be less verbose?
instead of saying 'You can edit these thresholds on the settings page.'
You can edit this threshold in Settings. though i am not a native english speaker
so you probably know better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy comes from the spec, so I think we can just leave it for now.

Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left few suggestions.
As far user experience goes, i know there is some technical limitation as to check if user is creating duplicate alerts.
Even if we can't enforce it, we should somehow let user know to make sure, to check if they have already created an alert or something.
It probably requires design. Otherwise Looks good.

@andrewvc
Copy link
Contributor

andrewvc commented May 4, 2020

@shahzad31 yes, that's probably in the second iteration WRT ensuring there aren't dupe alerts. Currently this is a limitation of alerting, so it's better to give people the option.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@justinkambic justinkambic merged commit 6d78489 into elastic:master May 5, 2020
justinkambic added a commit to justinkambic/kibana that referenced this pull request May 5, 2020
* Refactor settings form event handling and modify certs fields.

* Fix/improve broken types/unit/integration/api tests.

* Modify default expiration threshold.

* Rename test vars.

* Implement PR feedback.

* Refresh snapshots, fix broken tests/types.

* Remove unnecessary state spreading.

* Add type for settings field errors.

* Refresh test snapshots.

* Improve punctuation.

* Add TLS alert type.

* Add cert API request and runtime type checking.

* Add api test for cert api.

* Add unload command to certs test.

* Extract API params interface to io-ts type.

* Add TLS alert type on server.

* WIP - add state for changing selected alert type.

* Finish adding alert type for client, add server alert summary.

* Add some state variables.

* Update certs summary function to create required values.

* Refresh test snapshots.

* Clean up message generator function.

* Add a comment.

* Update formatting for alert messages, add flags denoting presence of age/expiration data.

* Add relative date information to tls alert messages.

* Clean up more logic in certs request function.

* Fix broken unit tests.

* Move tests for common function to new file.

* Fix logic error in test and add common state fields to tls alerts.

* Extract common state field translations from status check alert.

* Add a comment.

* Add nested context navigation for uptime alert selection.

* Clean up types.

* Fix translation key typo.

* Extract translations from tls alert factory.

* Extract summary messages to translation file.

* Change default tls alert time window from 1w to 1d.

* Remove unnecessary import.

* Simplify page linking.

* Extract a non-trivial component to a dedicated file.

* Simplify create alert copy.

* Fix broken functional test.

* Fix busted types.

* Fix tls query error.

* Allow for alerts toggle button to receive a set of types to display.

* Add alerts toggle button to certs page.

* Fix copy.

* Fixup punctuation in default message to avoid double-period symbols.

* Refresh snapshots.
justinkambic added a commit that referenced this pull request May 5, 2020
* Refactor settings form event handling and modify certs fields.

* Fix/improve broken types/unit/integration/api tests.

* Modify default expiration threshold.

* Rename test vars.

* Implement PR feedback.

* Refresh snapshots, fix broken tests/types.

* Remove unnecessary state spreading.

* Add type for settings field errors.

* Refresh test snapshots.

* Improve punctuation.

* Add TLS alert type.

* Add cert API request and runtime type checking.

* Add api test for cert api.

* Add unload command to certs test.

* Extract API params interface to io-ts type.

* Add TLS alert type on server.

* WIP - add state for changing selected alert type.

* Finish adding alert type for client, add server alert summary.

* Add some state variables.

* Update certs summary function to create required values.

* Refresh test snapshots.

* Clean up message generator function.

* Add a comment.

* Update formatting for alert messages, add flags denoting presence of age/expiration data.

* Add relative date information to tls alert messages.

* Clean up more logic in certs request function.

* Fix broken unit tests.

* Move tests for common function to new file.

* Fix logic error in test and add common state fields to tls alerts.

* Extract common state field translations from status check alert.

* Add a comment.

* Add nested context navigation for uptime alert selection.

* Clean up types.

* Fix translation key typo.

* Extract translations from tls alert factory.

* Extract summary messages to translation file.

* Change default tls alert time window from 1w to 1d.

* Remove unnecessary import.

* Simplify page linking.

* Extract a non-trivial component to a dedicated file.

* Simplify create alert copy.

* Fix broken functional test.

* Fix busted types.

* Fix tls query error.

* Allow for alerts toggle button to receive a set of types to display.

* Add alerts toggle button to certs page.

* Fix copy.

* Fixup punctuation in default message to avoid double-period symbols.

* Refresh snapshots.
@justinkambic
Copy link
Contributor Author

Backported to:
7.x/7.8.0 0bc8e3e
#65211

@justinkambic justinkambic deleted the uptime_tls-alert branch May 5, 2020 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:enhancement Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability v7.8.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS Cert validity alerting
5 participants