Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uptime] Redirect to error page when Heartbeat mappings are missing #110857

Merged
merged 19 commits into from
Sep 23, 2021

Conversation

justinkambic
Copy link
Contributor

@justinkambic justinkambic commented Sep 1, 2021

Summary

Resolves elastic/uptime#359.

This is WIP code resulting from a spike. It requires design input and further refinement from a code standpoint, as well as some copy updates.

This change would add a new page, mapping error, to the Uptime app's routes list. It should include helpful information that lets the user know that they need to fix their indices when Heartbeat's mappings were not appropriately installed.

At the moment, I've only modified one component on the Overview page to perform the redirect. Essentially, we catch this specific error from Elasticsearch and return it to the client; all other errors remain in a thrown state.

image

Testing

  1. Start Elasticsearch and Heartbeat
  2. curl -u elastic:{PASSWORD} -X DELETE "http://localhost:9200/{HEARTBEAT_INDEX}"
  3. Navigate to the Uptime UI, you should be redirected to the mapping error page
  4. Kill Heartbeat
  5. curl -u elastic:{PASSWORD} -X DELETE "http://localhost:9200/{HEARTBEAT_INDEX}"
  6. Start Heartbeat
  7. Navigate to Uptime UI, you should see the overview page per usual

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@justinkambic justinkambic added enhancement New value added to drive a business result v8.0.0 Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability v7.16.0 labels Sep 1, 2021
@justinkambic justinkambic self-assigned this Sep 1, 2021
iconType="cross"
// TODO: placeholder copy
title={<div>Heartbeat mappings are not installed</div>}
body={<div>You need to stop Heartbeat, delete your indices, and restart Heartbeat.</div>}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For copy WDYT of:

Incorrect mappings detected! Perhaps you forgot to run the heartbeat setup command? See the heartbeat quickstart for more information.

I'm also thinking maybe we should add a dedicated docs page for this and link to that. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dedicated docs might be nice, since it's such a common issue. We could include explicit remediation instructions there, since the quick-start guide doesn't address this problem directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened elastic/observability-docs#1018 to track the docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think having inline docs is always amazing, can we also keep instructions inline preferabbly with commands to run and lead to details step by step docs page,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinkambic justinkambic changed the title Initial PoC of redirect on mapping error is working. [Uptime] Redirect to error page when mappings are wrong Sep 1, 2021
@justinkambic justinkambic changed the title [Uptime] Redirect to error page when mappings are wrong [Uptime] Redirect to error page when Heartbeat mappings are missing Sep 1, 2021
@justinkambic
Copy link
Contributor Author

@elasticmachine merge upstream

@justinkambic justinkambic force-pushed the 359/detect-mapping-failure branch 2 times, most recently from 8ae8789 to 8c58cc5 Compare September 8, 2021 17:03
@justinkambic justinkambic marked this pull request as ready for review September 9, 2021 18:47
@justinkambic justinkambic requested a review from a team as a code owner September 9, 2021 18:47
@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@justinkambic justinkambic force-pushed the 359/detect-mapping-failure branch from d69da0b to 1cb42f8 Compare September 9, 2021 19:12
@justinkambic justinkambic enabled auto-merge (squash) September 10, 2021 19:30
@justinkambic
Copy link
Contributor Author

@elasticmachine merge upstream

@justinkambic
Copy link
Contributor Author

@elasticmachine merge upstream

Comment on lines 45 to 70
try {
return await libs.requests.getFilterBar({
uptimeEsClient,
dateRangeStart,
dateRangeEnd,
search: parsedSearch,
filterOptions: objectValuesToArrays<string>({
locations,
ports,
schemes,
tags,
}),
});
} catch (e) {
/**
* This particular error is usually indicative of a mapping problem within the user's
* indices. It's relevant for the UI because we will be able to provide the user with a
* tailored message to help them remediate this problem on their own with minimal effort.
*/
if (e.name === 'ResponseError') {
return response.badRequest({ body: e });
}
throw e;
}
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This route has been , infact this whole API has been removed in our filters API, so i wonder if this is the correct appraoch to handle this kind of error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I overlooked that - a dedicated route may be the way to go. The inclination for this approach is, we don't want to slow down the happy path that 99% of users will experience for the sake of this one other case. While it's a common error, it's rare for people to experience it.

Perhaps we should depend on the filters PR and adjust the logic in this PR to fit the error in that one, since the error should still occur with the new fetch procedure.

@justinkambic
Copy link
Contributor Author

@elasticmachine merge upstream

@shahzad31
Copy link
Contributor

@elasticmachine merge upstream

Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Great work, it's really going to be useful for our users !!

@shahzad31 shahzad31 added the auto-backport Deprecated - use backport:version if exact versions are needed label Sep 23, 2021
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
uptime 640 642 +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
uptime 558.7KB 560.8KB +2.1KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
uptime 24.2KB 24.2KB +53.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @justinkambic

@justinkambic justinkambic merged commit 26d19e7 into elastic:master Sep 23, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 23, 2021
…lastic#110857)

* Initial PoC of redirect on mapping error is working.

* Update copy. Add comments.

* Include headline element for page title.

* Create mappings for failing functional tests.

* Add functional test for mappings error page.

* Add mapping for certs check.
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Sep 23, 2021
…110857) (#112951)

* Initial PoC of redirect on mapping error is working.

* Update copy. Add comments.

* Include headline element for page title.

* Create mappings for failing functional tests.

* Add functional test for mappings error page.

* Add mapping for certs check.

Co-authored-by: Justin Kambic <[email protected]>
@justinkambic justinkambic deleted the 359/detect-mapping-failure branch September 27, 2021 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed enhancement New value added to drive a business result release_note:enhancement Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect when mapping is missing in Heartbeat indices and report in Uptime app
5 participants