[Endpoint] Fix endpoint tests with data streams #68794

jonathan-buttner · 2020-06-10T17:03:54Z

This PR addresses the endpoint test issue: #68584

Background

The endpoint code relies on the Ingest manager. The Ingest manager leverages the new data streams feature from ES. Data streams enforce a couple of things that cause issues with usage of es archiver.

Data streams

Data streams aren't normal indices. They're more of a group of indices. The backing index for a data stream is .ds-<index name>-<roll over number>. So when es archiver puts data in an index say events-endpoint the actual backing index will be .ds-events-endpoint-00001.

The Issues

Inserting documents using ES Archiver

When inserting documents into a data stream you have to use create otherwise an error will be returned:

"name": "ResponseError",
  "meta": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "The provided expression [events-endpoint-1] matches a data stream, specify the corresponding concrete indices instead."
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "The provided expression [events-endpoint-1] matches a data stream, specify the corresponding concrete indices instead."
      },
      "status": 400
    },
    "statusCode": 400,

To get around this I've updated es archiver to always use create instead of index.

I tried to solve this by having our archives put data directly into the backing index of a data stream .ds-.... The problem with this is the data stream does not get created until we send data to an index that matches the data stream template. So if we create/modify the backing index before the data stream is created it will not be part of the data stream and our queries will fail.

Another solution would be to post up a document that causes the stream to be created, then have es archiver modify the backing index with the documents we actually want in the tests. I don't love the idea of having to hard code the backing indices everywhere because there's a chance the names could change.

ES Archiver creating indices with a specific mapping

Data streams do not allow updating the mapping of the stream as a whole, instead you have to update the mapping of the specific backing index. The normal flow for es archiver is to create the index with a specific mapping before indexing the documents. If ES archiver tries to post up a mapping for events-endpoint it will fail because this is not allowed for data streams.

Our app leverages the endpoint package to install the templates for data streams. The ingest manager handles constructing the templates and installing them for us. When our tests run these templates are already installed (they're installed either explicitly before the tests run, or for functional tests they're done on startup of Kibana).

Since the templates are there we don't need es archiver to send a mapping for us which is why I have deleted them. I believe we could get this to work by pointing the mapping file to the backing index and having the data.json.gz indices point to the data stream name. This adds more maintenance burden in my opinion though.

Deleting data streams

To delete a data stream you have to issue a DELETE _data_stream/<stream name or pattern>. Es archiver is not able to do this. Es archiver uses the index api to delete indices which will fail because it is not an index. To get around this I've created a helper function for delete data streams for each event type. I don't love this solution either because we will have to update the index patterns whenever they change in the package. We can iterate on that later though.

elasticmachine · 2020-06-10T17:03:56Z

Pinging @elastic/endpoint-data-visibility-team (Team:Endpoint Data Visibility)

elasticmachine · 2020-06-10T17:03:57Z

Pinging @elastic/endpoint-app-team (Feature:Endpoint)

jonathan-buttner · 2020-06-10T17:10:27Z

src/es_archiver/lib/docs/index_doc_records_stream.ts

@@ -30,7 +30,7 @@ export function createIndexDocRecordsStream(client: Client, stats: Stats, progre
      stats.indexedDoc(doc.index);
      body.push(
        {
-          index: {
+          create: {


Data streams will fail if using index

jonathan-buttner · 2020-06-10T17:11:09Z

x-pack/test/api_integration/apis/endpoint/alerts/index.ts

@@ -66,26 +67,27 @@ export default function ({ getService }: FtrProviderContext) {
  const nextPrevPrefixOrder = 'order=desc';
  const nextPrevPrefixPageSize = 'page_size=10';
  const nextPrevPrefix = `${nextPrevPrefixQuery}&${nextPrevPrefixDateRange}&${nextPrevPrefixSort}&${nextPrevPrefixOrder}&${nextPrevPrefixPageSize}`;
-  const alertIndex = 'events-endpoint-1';
+  const alertIndex = '.ds-events-endpoint-1-000001';


The alert tests need the exact backing index for a couple of the tests. I don't love this, another option would be to just remove those tests I suppose.

I noticed the event is quite different from the other indices, I am guessing this is the best right now?

Yeah it's going to change soon. It depends on the conclusion of this discussion: https://github.com/elastic/endpoint-app-team/issues/102

spalger · 2020-06-10T17:25:58Z

It sounds like when esArchiver is asked to archive a datastream it should write the docs a little differently to our archives and maybe drop a special record into mappings.json that will clear the data stream rather than try to delete the index. Basically, we should probably add full support to esArchiver (maybe in a follow up PR since this gets tests working again which were failing) rather than leave the test code so coupled to the internal indexing procedure that the esArchiver uses and rely on people manually deleting the mappings.json files.

michaelolo24 · 2020-06-10T17:54:46Z

x-pack/test/api_integration/apis/endpoint/alerts/index.ts

-        await esArchiver.unload('endpoint/alerts/host_api_feature');
+        // the endpoint uses data streams and es archiver does not support deleting them at the moment so we need
+        // to do it manually
+        await deleteEventsStream(getService);


To help it run a bit faster, you can wrap these two calls in a promise.all

jonathan-buttner · 2020-06-10T17:58:58Z

@spalger

It sounds like when esArchiver is asked to archive a datastream it should write the docs a little differently to our archives and maybe drop a special record into mappings.json that will clear the data stream rather than try to delete the index. Basically, we should probably add full support to esArchiver (maybe in a follow up PR since this gets tests working again which were failing) rather than leave the test code so coupled to the internal indexing procedure that the esArchiver uses and rely on people manually deleting the mappings.json files.

Yep that sounds good to me. Although, because we rely on the Ingest manager to create the templates for us, we actually don't need the mapping files. It might actually be better to not have them for our tests because every time we update the Endpoint package we'd have to update the mappings as well. In the tests we should probably specifically rely on the template/mapping that the endpoint package and ingest manager install rather than creating our own. This would allow us to find bugs in the endpoint package's mapping definition.

It might be nice to have an option in es archiver to not save the mappings.

jonathan-buttner · 2020-06-10T19:28:58Z

@elasticmachine merge upstream

jonathan-buttner · 2020-06-10T19:35:13Z

@elasticmachine retest this

…nto es-archiver-ds

spalger · 2020-06-10T21:25:11Z

because we rely on the Ingest manager to create the templates for us, we actually don't need the mapping files. It might actually be better to not have them for our tests because every time we update the Endpoint package we'd have to update the mappings as well. In the tests we should probably specifically rely on the template/mapping that the endpoint package and ingest manager install rather than creating our own. This would allow us to find bugs in the endpoint package's mapping definition.

It might be nice to have an option in es archiver to not save the mappings.

Sorry for the dumb question, but is the ingest manager a part of Kibana? If the ingest manager changes the way the mapping is setup will that require a PR to Kibana where we would see the breaks? They wouldn't just start breaking master right?

jonathan-buttner · 2020-06-10T21:37:16Z

because we rely on the Ingest manager to create the templates for us, we actually don't need the mapping files. It might actually be better to not have them for our tests because every time we update the Endpoint package we'd have to update the mappings as well. In the tests we should probably specifically rely on the template/mapping that the endpoint package and ingest manager install rather than creating our own. This would allow us to find bugs in the endpoint package's mapping definition.
It might be nice to have an option in es archiver to not save the mappings.

Sorry for the dumb question, but is the ingest manager a part of Kibana? If the ingest manager changes the way the mapping is setup will that require a PR to Kibana where we would see the breaks? They wouldn't just start breaking master right?

Not a dumb question. Yeah it's an app in Kibana: https://github.com/elastic/kibana/blob/master/x-pack/plugins/ingest_manager/README.md

Yes and no to your question. The ingest manager builds the mapping here: https://github.com/elastic/kibana/blob/master/x-pack/plugins/ingest_manager/server/services/epm/elasticsearch/template/template.ts#L61

So that could would require a PR that would fail if it broke our tests. But the other part is the endpoint package which the ingest manager uses to build the template. The endpoint package is not stored in Kibana. If the package changed it would start breaking our tests. That's actually related to the work you're doing here with the docker service: #68173
and
#68172

Once that's in we can pin our endpoint package for the tests and avoid that issue.

Although it would be useful to test kibana with the package that is deployed in the cloud in some automated way. We still need to work that out though.

spalger · 2020-06-10T21:40:16Z

In that case I think it makes sense for the esArchiver to not store the mappings and instead store a record that we're dealing with a data stream so that it can still take care of emptying the underlying indexes before writing the docs in data.json

jonathan-buttner · 2020-06-10T21:46:48Z

In that case I think it makes sense for the esArchiver to not store the mappings and instead store a record that we're dealing with a data stream so that it can still take care of emptying the underlying indexes before writing the docs in data.json

Yeah sounds good. That's mainly our use case. I think in the scenario where someone wants to use data streams and wants esArchiver to create the data stream for them, we'd want to save the mapping and some indication that it should be a data stream. So on a load it would use the mapping with this api: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/indices-create-data-stream.html to create the data stream and then write the data.json.

jonathan-buttner · 2020-06-11T12:41:52Z

@spalger do these changes look good to you? Any concerns with merging this?

spalger

LGTM

kibanamachine · 2020-06-11T16:25:23Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: 290c3c3

History

💔 Build #53297 failed dbbe256
💚 Build #53155 succeeded 738d995
💔 Build #53135 failed 740563f
💔 Build #53101 failed 6a58ee9

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

* Temporary fix to get tests working again with data streams * Removing mappings and renabling tests * optionally using create for our tests only has a stop gap * Adding default for internal function * Removing tests that could fail if backing index changes * Removing unused import Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Elastic Machine <[email protected]>

charlie-pichette · 2020-07-22T14:45:40Z

This also addresses #68638.

jonathan-buttner added 2 commits June 10, 2020 11:59

Temporary fix to get tests working again with data streams

afd44ea

Removing mappings and renabling tests

6a58ee9

jonathan-buttner added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:Endpoint Data Visibility Team managing the endpoint resolver Feature:Endpoint Elastic Endpoint feature v7.9.0 labels Jun 10, 2020

jonathan-buttner requested review from a team as code owners June 10, 2020 17:03

jonathan-buttner commented Jun 10, 2020

View reviewed changes

michaelolo24 reviewed Jun 10, 2020

View reviewed changes

michaelolo24 approved these changes Jun 10, 2020

View reviewed changes

optionally using create for our tests only has a stop gap

57aa58a

nnamdifrankie approved these changes Jun 10, 2020

View reviewed changes

Merge branch 'master' into es-archiver-ds

740563f

jonathan-buttner added 2 commits June 10, 2020 16:36

Adding default for internal function

494a8e9

Merge branch 'es-archiver-ds' of github.com:jonathan-buttner/kibana i…

738d995

…nto es-archiver-ds

Removing tests that could fail if backing index changes

dbbe256

Removing unused import

290c3c3

spalger approved these changes Jun 11, 2020

View reviewed changes

paul-tavares mentioned this pull request Jun 11, 2020

Failing ES Promotion: endpoint Endpoint Alert Page: when es has data and user has navigated to the page #68596

Closed

jonathan-buttner merged commit 60f4a80 into elastic:master Jun 11, 2020

jonathan-buttner deleted the es-archiver-ds branch June 11, 2020 16:35

jonathan-buttner mentioned this pull request Jun 11, 2020

[7.x] [Endpoint] Fix endpoint tests with data streams (#68794) #68961

Merged

charlie-pichette mentioned this pull request Jul 22, 2020

Failing ES Promotion: apis Endpoint plugin Endpoint policy api #68638

Closed

michalpristas mentioned this pull request Aug 4, 2020

[Elastic Agent] Agent datastreams are conflicting with Filebeat setup elastic/beats#19369

Closed

klacabane mentioned this pull request Dec 2, 2021

[Stack Monitoring] Testing strategy for agent/integration data #119658

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Endpoint] Fix endpoint tests with data streams #68794

[Endpoint] Fix endpoint tests with data streams #68794

jonathan-buttner commented Jun 10, 2020 •

edited

Loading

elasticmachine commented Jun 10, 2020

elasticmachine commented Jun 10, 2020

jonathan-buttner Jun 10, 2020

jonathan-buttner Jun 10, 2020

nnamdifrankie Jun 10, 2020

jonathan-buttner Jun 10, 2020

spalger commented Jun 10, 2020

michaelolo24 Jun 10, 2020

jonathan-buttner commented Jun 10, 2020

jonathan-buttner commented Jun 10, 2020

jonathan-buttner commented Jun 10, 2020

spalger commented Jun 10, 2020

jonathan-buttner commented Jun 10, 2020

spalger commented Jun 10, 2020

jonathan-buttner commented Jun 10, 2020 •

edited

Loading

jonathan-buttner commented Jun 11, 2020

spalger left a comment

kibanamachine commented Jun 11, 2020

charlie-pichette commented Jul 22, 2020

[Endpoint] Fix endpoint tests with data streams #68794

[Endpoint] Fix endpoint tests with data streams #68794

Conversation

jonathan-buttner commented Jun 10, 2020 • edited Loading

Background

Data streams

The Issues

Inserting documents using ES Archiver

ES Archiver creating indices with a specific mapping

Deleting data streams

elasticmachine commented Jun 10, 2020

elasticmachine commented Jun 10, 2020

jonathan-buttner Jun 10, 2020

Choose a reason for hiding this comment

jonathan-buttner Jun 10, 2020

Choose a reason for hiding this comment

nnamdifrankie Jun 10, 2020

Choose a reason for hiding this comment

jonathan-buttner Jun 10, 2020

Choose a reason for hiding this comment

spalger commented Jun 10, 2020

michaelolo24 Jun 10, 2020

Choose a reason for hiding this comment

jonathan-buttner commented Jun 10, 2020

jonathan-buttner commented Jun 10, 2020

jonathan-buttner commented Jun 10, 2020

spalger commented Jun 10, 2020

jonathan-buttner commented Jun 10, 2020

spalger commented Jun 10, 2020

jonathan-buttner commented Jun 10, 2020 • edited Loading

jonathan-buttner commented Jun 11, 2020

spalger left a comment

Choose a reason for hiding this comment

kibanamachine commented Jun 11, 2020

💚 Build Succeeded

History

charlie-pichette commented Jul 22, 2020

jonathan-buttner commented Jun 10, 2020 •

edited

Loading

jonathan-buttner commented Jun 10, 2020 •

edited

Loading