Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Synthetics] Document changing retention on a per-data-stream basis #1944

Merged
merged 4 commits into from
Aug 22, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/en/observability/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ include::inspect-uptime-duration-anomalies.asciidoc[leveloffset=+3]

include::configure-uptime-settings.asciidoc[leveloffset=+2]

include::manage-synthetics-retention.asciidoc[leveloffset=+2]

include::troubleshoot-uptime-mapping-issues.asciidoc[leveloffset=+2]
colleenmcginnis marked this conversation as resolved.
Show resolved Hide resolved

// User experience
Expand Down
41 changes: 41 additions & 0 deletions docs/en/observability/manage-synthetics-retention.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
[[manage-synthetics-retention]]

= Managing data retention

[discrete]
== Overview
colleenmcginnis marked this conversation as resolved.
Show resolved Hide resolved

Synthetics browser monitors can require large amounts of storage, this document provides information on how synthetics stores data and how to optimize retention
to control storage utilization. This usually means altering the retention of browser `network` and `screenshot` documents.
colleenmcginnis marked this conversation as resolved.
Show resolved Hide resolved

All types of checks record 'core' metadata, such as which URL was checked, what the status of the check was, and errors that occurred. This document
focuses on browser checks, which tend to use much more storage than lightweight checks.
Browser based checks store two additional types of data beyond the core metadata; `network` documents and `screenshot` documents.
These indices are usually many times larger than the core metadata. The relative sizes of each vary depending on the sites being
checked with network data usually being the larger of the two by a significant factor.

`network` documents data consists of detailed metadata around requests for resources required by the pages being checked. An example would be an image,
a javascript file, or a font referenced and loaded by the page under test.
For each of these requests the URL, timing, headers, and other metadata are stored. While individually this metadata is small,
modern websites often request hundreds of additional resources per page load, adding up to a significant amount of storage.

Screenshot data consists of binary image data used to construct a screenshot and metadata with information related to de-duplicating this data. De-duplication
colleenmcginnis marked this conversation as resolved.
Show resolved Hide resolved
efficiency makes screenshot data less burdensome than network data, especially for sites that are mostly visually unchanged across test runs.
De-duplication works by splitting each captured screenshot into an 8x8 grid of chunks each stored as a separate document keyed by a hash of its pixels.
This means that across checks to a given site only changes to the visual representation of the site require significant additional storage. If a site has not changed
visually across runs the only additional storage required is the tiny `screenshot_ref` document for that image, which points to the relevant image blocks.

[discrete]
=== Managing the lifecycles of synthetics data streams
colleenmcginnis marked this conversation as resolved.
Show resolved Hide resolved

Synthetics data is recorded in Elasticsearch [data streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html), an append-only
structure in Elasticsearch. Synthetics data streams can be managed via the Elasticsearch API or via [Kibana index management](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html).

If Synthetics browser data streams are storing data longer than necessary, users can opt to retain `screenshot` and `network_info` datasets for a shorter period than the core metadata.
To do so, first navigate to [Kibana index management](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html), then filter the list of data streams for
those containing the term 'synthetics'. In the UI there will be three sorts of data stream present `synthetics-browser-*`, `synthetics-browser.network-*`, and `synthetics-browser.screenshot-*`. From this page you can retrieve the size of each data stream on disk, as well as which [index life cycle](https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-lifecycle-policy.html) is associated with it.
colleenmcginnis marked this conversation as resolved.
Show resolved Hide resolved

To change the retention period of a given data stream simply edit the life cycle policy associated with it after ensuring that policy does not apply to addition data streams whose
colleenmcginnis marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

@dominiqueclarke dominiqueclarke Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply edit the life cycle policy associated with it

Since our ILM policy defaults will be based off the integration package, we have to be careful with this. If the user edits existing ILM managed by the policy, their changes will be overwritten on the next update to the integration policy.

retention you do not want to change. If the data stream you wish to change shares a life cycle policy with another, [create a new ILM policy](https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-lifecycle-policy.html), then edit the relevant component template's
index settings. When editing the component template change the `index.lifecycle.name` value to point toward your new ILM policy. For `screenshot` documents edit the `synthetics-browser.screenshot@package` component template, for `network_info` documents edit the `synthetics-browser.network@package` template. Check that the settings have taken effect by visiting the data stream management
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other documentation includes a bit more specifics about how to edit the component template, exactly what to change with example configuration. I think we should include that here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about instructing customers to update the @package templates rather than the @custom templates for setting retention across package updates.

page and confirming the attached life cycle policy is correct.