From 03c479c7d57324e31ca96653ce09af69130b47be Mon Sep 17 00:00:00 2001 From: Andrew Cholakian Date: Wed, 22 Jun 2022 22:23:59 -0500 Subject: [PATCH 1/4] [Synthetics] Document changing retention on a per-data-stream basis This patch adds a new docs page for the commonly requested task of altering retention periods for the space-hungry `synthetics-browser.network` and `synthetics-browser.screenshot` indices. While we will have better defaults in https://github.com/elastic/uptime/issues/462 we need this doc before that change makes it into 8.4. Additionally, we will always need some level of documentation here, though we can simplify it a little after that. I've tried to keep the docs screenshot free for kibana index management, since I don't think we'll be able to track visual changes there well. It _is_ a complex process, but I've tried my best to describe it. Once the better defaults are merged it should be as simple as clicking on the default lifecycle policy and editing it, a lot of the complexity now is due to the fact that we share one policy across all dataset types --- docs/en/observability/index.asciidoc | 2 + .../manage-synthetics-retention.asciidoc | 41 +++++++++++++++++++ 2 files changed, 43 insertions(+) create mode 100644 docs/en/observability/manage-synthetics-retention.asciidoc diff --git a/docs/en/observability/index.asciidoc b/docs/en/observability/index.asciidoc index 9a4f12094c..84397186a0 100644 --- a/docs/en/observability/index.asciidoc +++ b/docs/en/observability/index.asciidoc @@ -124,6 +124,8 @@ include::inspect-uptime-duration-anomalies.asciidoc[leveloffset=+3] include::configure-uptime-settings.asciidoc[leveloffset=+2] +include::manage-synthetics-retention.asciidoc[leveloffset=+2] + include::troubleshoot-uptime-mapping-issues.asciidoc[leveloffset=+2] // User experience diff --git a/docs/en/observability/manage-synthetics-retention.asciidoc b/docs/en/observability/manage-synthetics-retention.asciidoc new file mode 100644 index 0000000000..df98455473 --- /dev/null +++ b/docs/en/observability/manage-synthetics-retention.asciidoc @@ -0,0 +1,41 @@ +[[manage-synthetics-retention]] + += Managing data retention + +[discrete] +== Overview + +Synthetics browser monitors can require large amounts of storage, this document provides information on how synthetics stores data and how to optimize retention +to control storage utilization. This usually means altering the retention of browser `network` and `screenshot` documents. + +All types of checks record 'core' metadata, such as which URL was checked, what the status of the check was, and errors that occurred. This document +focuses on browser checks, which tend to use much more storage than lightweight checks. +Browser based checks store two additional types of data beyond the core metadata; `network` documents and `screenshot` documents. +These indices are usually many times larger than the core metadata. Nhe relative sizes of each vary depending on the sites being +checked with network data usually being the larger of the two by a significant factor. + +`network` documents data consists of detailed metadata around requests for resources required by the pages being checked. An example would be an image, +a javascript file, or a font referenced and loaded by the page under test. +For each of these requests the URL, timing, headers, and other metadata are stored. While individually this metadata is small, +modern websites often request hundreds of additional resources per page load, adding up to a significant amount of storage. + +Screenshot data consists of binary image data used to construct a screenshot and metadata with information related to de-duplicating this data. De-duplication +efficiency makes screenshot data less burdensome than network data, especially for sites that are mostly visually unchanged across test runs. +De-duplication works by splitting each captured screenshot into an 8x8 grid of chunks each stored as a separate document keyed by a hash of its pixels. +This means that across checks to a given site only changes to the visual representation of the site require significant additional storage. If a site has not changed +visually across runs the only additional storage required is the tiny `screenshot_ref` document for that image, which points to the relevant image blocks. + +[discrete] +=== Managing the lifecycles of synthetics data streams + +Synthetics data is recorded in Elasticsearch [data streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html), an append-only +structure in Elasticsearch. Synthetics data streams can be managed via the Elasticsearch API or via [Kibana index management](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html). + +If Synthetics browser data streams are storing data longer than necessary, users can opt to retain `screenshot` and `network_info` datasets for a shorter period than the core metadata. +To do so, first navigate to [Kibana index management](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html), then filter the list of data streams for +those containing the term 'synthetics'. In the UI there will be three sorts of data stream present `synthetics-browser-*`, `synthetics-browser.network-*`, and `synthetics-browser.screenshot-*`. From this page you can retrieve the size of each data stream on disk, as well as which [index life cycle](https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-lifecycle-policy.html) is associated with it. + +To change the retention period of a given data stream simply edit the life cycle policy associated with it after ensuring that policy does not apply to addition data streams whose +retention you do not want to change. If the data stream you wish to change shares a life cycle policy with another, [create a new ILM policy](https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-lifecycle-policy.html), then edit the relevant component template's +index settings. When editing the component template change the `index.lifecycle.name` value to point toward your new ILM policy. For `screenshot` documents edit the `synthetics-browser.screenshot@package` component template, for `network_info` documents edit the `synthetics-browser.network@package` template. Check that the settings have taken effect by visiting the data stream management +page and confirming the the attached life cycle policy is correct. \ No newline at end of file From 836652042968ff3621d0c9d0660371b1e97d2ac2 Mon Sep 17 00:00:00 2001 From: Andrew Cholakian Date: Thu, 23 Jun 2022 16:19:58 -0500 Subject: [PATCH 2/4] PR feedback --- docs/en/observability/manage-synthetics-retention.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/observability/manage-synthetics-retention.asciidoc b/docs/en/observability/manage-synthetics-retention.asciidoc index df98455473..b1d24b12ed 100644 --- a/docs/en/observability/manage-synthetics-retention.asciidoc +++ b/docs/en/observability/manage-synthetics-retention.asciidoc @@ -11,7 +11,7 @@ to control storage utilization. This usually means altering the retention of bro All types of checks record 'core' metadata, such as which URL was checked, what the status of the check was, and errors that occurred. This document focuses on browser checks, which tend to use much more storage than lightweight checks. Browser based checks store two additional types of data beyond the core metadata; `network` documents and `screenshot` documents. -These indices are usually many times larger than the core metadata. Nhe relative sizes of each vary depending on the sites being +These indices are usually many times larger than the core metadata. The relative sizes of each vary depending on the sites being checked with network data usually being the larger of the two by a significant factor. `network` documents data consists of detailed metadata around requests for resources required by the pages being checked. An example would be an image, @@ -38,4 +38,4 @@ those containing the term 'synthetics'. In the UI there will be three sorts of d To change the retention period of a given data stream simply edit the life cycle policy associated with it after ensuring that policy does not apply to addition data streams whose retention you do not want to change. If the data stream you wish to change shares a life cycle policy with another, [create a new ILM policy](https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-lifecycle-policy.html), then edit the relevant component template's index settings. When editing the component template change the `index.lifecycle.name` value to point toward your new ILM policy. For `screenshot` documents edit the `synthetics-browser.screenshot@package` component template, for `network_info` documents edit the `synthetics-browser.network@package` template. Check that the settings have taken effect by visiting the data stream management -page and confirming the the attached life cycle policy is correct. \ No newline at end of file +page and confirming the attached life cycle policy is correct. \ No newline at end of file From 0d925cc5d690c8ac5bc13cb0ac869b46291b708b Mon Sep 17 00:00:00 2001 From: Colleen McGinnis Date: Tue, 16 Aug 2022 17:39:37 -0500 Subject: [PATCH 3/4] restructure to give context then hand off to fleet doc --- .../manage-synthetics-retention.asciidoc | 81 ++++++++++++------- 1 file changed, 51 insertions(+), 30 deletions(-) diff --git a/docs/en/observability/manage-synthetics-retention.asciidoc b/docs/en/observability/manage-synthetics-retention.asciidoc index b1d24b12ed..fec4b045ce 100644 --- a/docs/en/observability/manage-synthetics-retention.asciidoc +++ b/docs/en/observability/manage-synthetics-retention.asciidoc @@ -1,41 +1,62 @@ [[manage-synthetics-retention]] -= Managing data retention += Manage data retention -[discrete] -== Overview +When you set up a synthetic monitor, data from the monitor is saved in +https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html[{es} data streams], +an append-only structure in {es}. -Synthetics browser monitors can require large amounts of storage, this document provides information on how synthetics stores data and how to optimize retention -to control storage utilization. This usually means altering the retention of browser `network` and `screenshot` documents. +There are six data streams recorded by synthetic monitors: `http`, `tcp`, `icmp`, `browser`, `browser.network`, `browser.screenshot`. +Elastic will retain data from each data stream for some time period, +and the default time period varies by data stream. +If you want to reduce the amount of storage required or store data for longer, +you can customize how long to retain data for each data stream. -All types of checks record 'core' metadata, such as which URL was checked, what the status of the check was, and errors that occurred. This document -focuses on browser checks, which tend to use much more storage than lightweight checks. -Browser based checks store two additional types of data beyond the core metadata; `network` documents and `screenshot` documents. -These indices are usually many times larger than the core metadata. The relative sizes of each vary depending on the sites being +[discrete] +== Synthetics data streams + +There are six data streams recorded by synthetic monitors: + +[options="header"] +|=== +| Data stream | Data includes | Default retention period | +| `http` | The URL that was checked, the status of the check, and any errors that occurred | 1 year | +| `tcp` | The URL that was checked, the status of the check, and any errors that occurred | 1 year | +| `icmp` | The URL that was checked, the status of the check, and any errors that occurred | 1 year | +| `browser` | The URL that was checked, the status of the check, and any errors that occurred | 1 year | +| `browser.screenshot` | Binary image data used to construct a screenshot and metadata with information related to de-duplicating this data | 14 days | +| `browser.network` | Detailed metadata around requests for resources required by the pages being checked | 14 days | +|=== + +// preserving the text below in case we want to expand on the definitions above + +// `network` documents data consists of detailed metadata around requests for resources required by the pages being checked. An example would be an image, +// a javascript file, or a font referenced and loaded by the page under test. +// For each of these requests the URL, timing, headers, and other metadata are stored. While individually this metadata is small, +// modern websites often request hundreds of additional resources per page load, adding up to a significant amount of storage. + +// Screenshot data consists of binary image data used to construct a screenshot and metadata with information related to de-duplicating this data. De-duplication +// efficiency makes screenshot data less burdensome than network data, especially for sites that are mostly visually unchanged across test runs. +// De-duplication works by splitting each captured screenshot into an 8x8 grid of chunks each stored as a separate document keyed by a hash of its pixels. +// This means that across checks to a given site only changes to the visual representation of the site require significant additional storage. If a site has not changed +// visually across runs the only additional storage required is the tiny `screenshot_ref` document for that image, which points to the relevant image blocks. + +All types of checks record core metadata. +Browser-based checks store two additional types of data: network and screenshot documents. +These browser-specific indices are usually many times larger than the core metadata. +The relative sizes of each vary depending on the sites being checked with network data usually being the larger of the two by a significant factor. -`network` documents data consists of detailed metadata around requests for resources required by the pages being checked. An example would be an image, -a javascript file, or a font referenced and loaded by the page under test. -For each of these requests the URL, timing, headers, and other metadata are stored. While individually this metadata is small, -modern websites often request hundreds of additional resources per page load, adding up to a significant amount of storage. - -Screenshot data consists of binary image data used to construct a screenshot and metadata with information related to de-duplicating this data. De-duplication -efficiency makes screenshot data less burdensome than network data, especially for sites that are mostly visually unchanged across test runs. -De-duplication works by splitting each captured screenshot into an 8x8 grid of chunks each stored as a separate document keyed by a hash of its pixels. -This means that across checks to a given site only changes to the visual representation of the site require significant additional storage. If a site has not changed -visually across runs the only additional storage required is the tiny `screenshot_ref` document for that image, which points to the relevant image blocks. - [discrete] -=== Managing the lifecycles of synthetics data streams +== Customize data stream lifecycles + +If Synthetics browser data streams are storing data longer than necessary, +you can opt to retain data for a shorter period. -Synthetics data is recorded in Elasticsearch [data streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html), an append-only -structure in Elasticsearch. Synthetics data streams can be managed via the Elasticsearch API or via [Kibana index management](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html). +To find Synthetics data streams: -If Synthetics browser data streams are storing data longer than necessary, users can opt to retain `screenshot` and `network_info` datasets for a shorter period than the core metadata. -To do so, first navigate to [Kibana index management](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html), then filter the list of data streams for -those containing the term 'synthetics'. In the UI there will be three sorts of data stream present `synthetics-browser-*`, `synthetics-browser.network-*`, and `synthetics-browser.screenshot-*`. From this page you can retrieve the size of each data stream on disk, as well as which [index life cycle](https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-lifecycle-policy.html) is associated with it. +. Navigate to https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html[{kib} index management]. +. Filter the list of data streams for those containing the term `synthetics`. +.. In the UI there will be three types of browser data streams: `synthetics-browser-*`, `synthetics-browser.network-*`, and `synthetics-browser.screenshot-*`. -To change the retention period of a given data stream simply edit the life cycle policy associated with it after ensuring that policy does not apply to addition data streams whose -retention you do not want to change. If the data stream you wish to change shares a life cycle policy with another, [create a new ILM policy](https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-lifecycle-policy.html), then edit the relevant component template's -index settings. When editing the component template change the `index.lifecycle.name` value to point toward your new ILM policy. For `screenshot` documents edit the `synthetics-browser.screenshot@package` component template, for `network_info` documents edit the `synthetics-browser.network@package` template. Check that the settings have taken effect by visiting the data stream management -page and confirming the attached life cycle policy is correct. \ No newline at end of file +Then, you can refer to https://www.elastic.co/guide/en/fleet/current/data-streams.html#data-streams-ilm-tutorial[Tutorial: Customize data retention for integrations] to learn how to apply a custom {ilm-init} policy to the browser data streams. From e4fce65236a95b5f13a3919cc036f2d8350bc966 Mon Sep 17 00:00:00 2001 From: Colleen McGinnis Date: Mon, 22 Aug 2022 11:00:35 -0500 Subject: [PATCH 4/4] clean up --- docs/en/observability/index.asciidoc | 2 +- ...oc => synthetics-manage-retention.asciidoc} | 18 +++--------------- 2 files changed, 4 insertions(+), 16 deletions(-) rename docs/en/observability/{manage-synthetics-retention.asciidoc => synthetics-manage-retention.asciidoc} (65%) diff --git a/docs/en/observability/index.asciidoc b/docs/en/observability/index.asciidoc index 84397186a0..960df87014 100644 --- a/docs/en/observability/index.asciidoc +++ b/docs/en/observability/index.asciidoc @@ -124,7 +124,7 @@ include::inspect-uptime-duration-anomalies.asciidoc[leveloffset=+3] include::configure-uptime-settings.asciidoc[leveloffset=+2] -include::manage-synthetics-retention.asciidoc[leveloffset=+2] +include::synthetics-manage-retention.asciidoc[leveloffset=+2] include::troubleshoot-uptime-mapping-issues.asciidoc[leveloffset=+2] diff --git a/docs/en/observability/manage-synthetics-retention.asciidoc b/docs/en/observability/synthetics-manage-retention.asciidoc similarity index 65% rename from docs/en/observability/manage-synthetics-retention.asciidoc rename to docs/en/observability/synthetics-manage-retention.asciidoc index fec4b045ce..200c137906 100644 --- a/docs/en/observability/manage-synthetics-retention.asciidoc +++ b/docs/en/observability/synthetics-manage-retention.asciidoc @@ -1,5 +1,4 @@ -[[manage-synthetics-retention]] - +[[synthetics-manage-retention]] = Manage data retention When you set up a synthetic monitor, data from the monitor is saved in @@ -13,6 +12,7 @@ If you want to reduce the amount of storage required or store data for longer, you can customize how long to retain data for each data stream. [discrete] +[[synthetics-manage-retention-data-streams]] == Synthetics data streams There are six data streams recorded by synthetic monitors: @@ -28,19 +28,6 @@ There are six data streams recorded by synthetic monitors: | `browser.network` | Detailed metadata around requests for resources required by the pages being checked | 14 days | |=== -// preserving the text below in case we want to expand on the definitions above - -// `network` documents data consists of detailed metadata around requests for resources required by the pages being checked. An example would be an image, -// a javascript file, or a font referenced and loaded by the page under test. -// For each of these requests the URL, timing, headers, and other metadata are stored. While individually this metadata is small, -// modern websites often request hundreds of additional resources per page load, adding up to a significant amount of storage. - -// Screenshot data consists of binary image data used to construct a screenshot and metadata with information related to de-duplicating this data. De-duplication -// efficiency makes screenshot data less burdensome than network data, especially for sites that are mostly visually unchanged across test runs. -// De-duplication works by splitting each captured screenshot into an 8x8 grid of chunks each stored as a separate document keyed by a hash of its pixels. -// This means that across checks to a given site only changes to the visual representation of the site require significant additional storage. If a site has not changed -// visually across runs the only additional storage required is the tiny `screenshot_ref` document for that image, which points to the relevant image blocks. - All types of checks record core metadata. Browser-based checks store two additional types of data: network and screenshot documents. These browser-specific indices are usually many times larger than the core metadata. @@ -48,6 +35,7 @@ The relative sizes of each vary depending on the sites being checked with network data usually being the larger of the two by a significant factor. [discrete] +[[synthetics-manage-retention-customize]] == Customize data stream lifecycles If Synthetics browser data streams are storing data longer than necessary,