Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Fix package upgrade breaking after first rollover before new data has arrived #79887

Merged
merged 6 commits into from
Oct 12, 2020

Conversation

neptunian
Copy link
Contributor

@neptunian neptunian commented Oct 7, 2020

Summary

Resolves #79370

If the user has rolled over a datastream in the past and tries to upgrade a package before any new data has arrived in the current write index, a failure will occur because the data_stream constant field mappings do not yet have values which are relied on to create the data stream name.

Change

Instead of relying on constant values of the datastream mappings in the current write index to compose the name of the datastream, query for the data_stream constant values amongst all indices

@neptunian neptunian added v8.0.0 v7.10.0 Team:Fleet Team label for Observability Data Collection Fleet team labels Oct 7, 2020
@neptunian neptunian requested a review from a team October 7, 2020 17:52
@neptunian neptunian self-assigned this Oct 7, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@neptunian neptunian changed the title [Ingest Manager] build datastream name from index name [Ingest Manager] Fix upgrade breaking after first rollover before new data has arrived Oct 7, 2020
@neptunian neptunian changed the title [Ingest Manager] Fix upgrade breaking after first rollover before new data has arrived [Ingest Manager] Fix package upgrade breaking after first rollover before new data has arrived Oct 7, 2020
@ruflin
Copy link
Contributor

ruflin commented Oct 8, 2020

Is there also an option for us to ensure the values are always there / already set?

throw new Error(`data_stream values are missing from the index template ${indexName}`);
const dataStreamName = `${dataStream.type.value}-${dataStream.dataset.value}-${dataStream.namespace.value}`;

const [, dsType, dsTemplateName, dsNamespace] = indexName.split('-');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly indexName is a only a string vs some other type. Can we at least add some assertions to guard against undefined values or anything to give us more confidence it's what we expect?

Copy link
Contributor

@jonathan-buttner jonathan-buttner Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's probably worth checking. If the backing index prefix changes to something like .ds_logs-... we'd also have an invalid type. Maybe we should check that the type is either logs or metrics? I'm not sure if it's valid for it to be anything else. Or maybe we can use the enums that we use to create the kibana index patterns and make sure it at least is one of those that way we'd only have to change a single place to add new valid types.

@jonathan-buttner
Copy link
Contributor

Is there also an option for us to ensure the values are always there / already set?

@ruflin do you mean avoid performing an upgrade until those values are present aka data has come in to populate those fields? For indices that don't receive data very often (metrics-endpoint.metadata-default only gets events like once a day per host I think), it could take a while before we'd allow an upgrade.

@neptunian
Copy link
Contributor Author

A bit more complicated but we should be able update the component template {type-templateName}-mappings for eg logs-endpoint.events.library-mappings right before rolling over. This would require 2 requests to get the mappings of the current write index which has these values, update the component template with the constants' values, and then trigger the rollover.

@jonathan-buttner
Copy link
Contributor

A bit more complicated but we should be able update the component template {type-templateName}-mappings for eg logs-endpoint.events.library-mappings right before rolling over. This would require 2 requests to get the mappings of the current write index which has these values, update the component template with the constants' values, and then trigger the rollover.

Oh interesting. What about the scenario where the ILM did the rollover so the empty backing index already exists and then we try to do an upgrade?

@neptunian
Copy link
Contributor Author

neptunian commented Oct 8, 2020

Oh interesting. What about the scenario where the ILM did the rollover so the empty backing index already exists and then we try to do an upgrade?

Good point. Could always fetch the first backing index or perform a query on it:

GET .ds-metrics-system.socket_summary-default-000001/_search
{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "data_stream.namespace"
          }
        }
      ]
    }
  }
}

or use a wildcard if we don't want to assume the name GET *metrics-system.socket_summary*/_search. And get the namespace this way. I think this is a good alternative if we're worried about using the index name.

@jonathan-buttner
Copy link
Contributor

Oh interesting. What about the scenario where the ILM did the rollover so the empty backing index already exists and then we try to do an upgrade?

Good point. Could always fetch the first backing index or perform a query on it:

GET .ds-metrics-system.socket_summary-default-000001/_search
{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "data_stream.namespace"
          }
        }
      ]
    }
  }
}

or use a wildcard if we don't want to assume the name GET *metrics-system.socket_summary*/_search. And get the namespace this way. I think this is a good alternative if we're worried about using the index name.

Yeah I think you can just search the data stream name: GET metrics-system.socket_summary*/_search instead of the backing index right?

@neptunian
Copy link
Contributor Author

Yeah I think you can just search the data stream name: GET metrics-system.socket_summary*/_search instead of the backing index right?

Yes, you're right.

@neptunian neptunian force-pushed the 79370-datastream-name-from-index branch from 8773a00 to edb3cc3 Compare October 8, 2020 18:39
namespace: string;
type: DataType;
} = searchDataStreamFieldsResponse.hits.hits[0]._source.data_stream;
const dataStreamName = `${type}-${dataset}-${namespace}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we want to check that type is a valid DataType? I suppose if they ingested data initially with an invalid data type then we should just continue doing that 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, i'm not sure this is the place to be validating that and whether we'd have to throw an error and stop the upgrade because of that.

Copy link
Contributor

@jfsiii jfsiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add any tests?

@neptunian
Copy link
Contributor Author

Can we add any tests?

These existing tests would cover that it was rolled over with the expected name https://github.com/elastic/kibana/blob/master/x-pack/test/ingest_manager_api_integration/apis/epm/data_stream.ts

@jfsiii
Copy link
Contributor

jfsiii commented Oct 9, 2020

But didn't our existing tests not identify the original issue? Can we add a test which prevents a regression?

@neptunian
Copy link
Contributor Author

neptunian commented Oct 9, 2020

But didn't our existing tests not identify the original issue? Can we add a test which prevents a regression?

It's true our existing test did not identify the condition that caused this bug, but the condition no longer exists where the data_stream constants would not exist in a particular index because we are no longer getting it from the current index, but checking any index in the data stream. The data stream would not exist if at least one document didn't exist and every document sends the value. So the current integration tests cover the case because it could not have rolled over if i didn't have the constants. Testing for the previous scenario would not cover the case, currently. I'd have to perform a 2nd rollover to see if it has the constants in the current write index but we no longer depend on subsequent rollovers to carry constants over.

@jonathan-buttner
Copy link
Contributor

jonathan-buttner commented Oct 9, 2020

But didn't our existing tests not identify the original issue? Can we add a test which prevents a regression?

It's true our existing test did not identify the condition that caused this bug, but the condition no longer exists where the data_stream constants would not exist in a particular index because we are no longer getting it from the current index, but checking any index in the data stream. The data stream would not exist if at least one document didn't exist and every document sends the value. So the current integration tests cover the case because it could not have rolled over if i didn't have the constants. Testing for the previous scenario would not cover the case, currently. I'd have to perform a 2nd rollover to see if it has the constants in the current write index but we no longer depend on subsequent rollovers to carry constants over.

It'd probably still be worth creating a test that does:

  1. Install a package
  2. Add a document that populates the data_stream fields
  3. Do a manual roll over
  4. Ensure that an upgrade of the package succeeds

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

page load bundle size

id before after diff
ingestManager 391.5KB 391.8KB +327.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@neptunian neptunian merged commit 57413f8 into elastic:master Oct 12, 2020
neptunian added a commit to neptunian/kibana that referenced this pull request Oct 12, 2020
…fore new data has arrived (elastic#79887)

* build datastream name from index name

* query for data_stream constants to create data stream name

* simply datastream tests and add a test to upgrade after a datastream rolls over

* improve query

* remove dup
neptunian added a commit to neptunian/kibana that referenced this pull request Oct 12, 2020
…fore new data has arrived (elastic#79887)

* build datastream name from index name

* query for data_stream constants to create data stream name

* simply datastream tests and add a test to upgrade after a datastream rolls over

* improve query

* remove dup
neptunian added a commit that referenced this pull request Oct 13, 2020
…fore new data has arrived (#79887) (#80232)

* build datastream name from index name

* query for data_stream constants to create data stream name

* simply datastream tests and add a test to upgrade after a datastream rolls over

* improve query

* remove dup
neptunian added a commit that referenced this pull request Oct 13, 2020
…fore new data has arrived (#79887) (#80231)

* build datastream name from index name

* query for data_stream constants to create data stream name

* simply datastream tests and add a test to upgrade after a datastream rolls over

* improve query

* remove dup
jloleysens added a commit to jloleysens/kibana that referenced this pull request Oct 13, 2020
…a-tier-migration

* 'master' of github.com:elastic/kibana: (34 commits)
  Line Visualization improper scaling can result in gaps elastic#79663 (elastic#80135)
  [Security Solution][Timeline] Fix SelectableTimeline search (elastic#80128)
  move field_mapping util to saved_objects plugin (elastic#79918)
  [ML] Datagrid: Ensure column content with 'boolean' schema is not capitalized (elastic#80041)
  [CI] Correctly resolve repository root for JUnit reports (elastic#80226)
  [Ingest Manager] Fix package upgrade breaking after first rollover before new data has arrived (elastic#79887)
  [Security Solution] Fix positioning of Kibana banners list inside Sec… (elastic#80124)
  add missing await to fix test (elastic#80202)
  Revert test data changed in previous commit. (elastic#79479)
  [Security Solution] [Sourcerer] Jest beef up (elastic#79907)
  Re-enable transaction duration alert story (elastic#80187)
  [npm] remove canvas dep (elastic#80185)
  [DOCS] Redirects ILM docs to Elasticsearch reference (elastic#79798)
  [APM] Catch health status error from ML (elastic#80131)
  Fix layout and remove title for add alert popover. (elastic#77633)
  [Discover] Loading spinner cleanup (elastic#79819)
  [Security Solution] [Resolver] Remove related events api (elastic#79036)
  [Ingest Manager] Remove fields from index pattern during package uninstall (elastic#80082)
  do not refetch license if signature header absents from a response (elastic#79645)
  Only send agent data for non-opentelemetry agents (elastic#79587)
  ...

# Conflicts:
#	x-pack/plugins/index_lifecycle_management/server/routes/api/nodes/register_list_route.ts
gmmorris added a commit to gmmorris/kibana that referenced this pull request Oct 13, 2020
* upstream/master: (34 commits)
  Improve vis editor typings (elastic#80004)
  Line Visualization improper scaling can result in gaps elastic#79663 (elastic#80135)
  [Security Solution][Timeline] Fix SelectableTimeline search (elastic#80128)
  move field_mapping util to saved_objects plugin (elastic#79918)
  [ML] Datagrid: Ensure column content with 'boolean' schema is not capitalized (elastic#80041)
  [CI] Correctly resolve repository root for JUnit reports (elastic#80226)
  [Ingest Manager] Fix package upgrade breaking after first rollover before new data has arrived (elastic#79887)
  [Security Solution] Fix positioning of Kibana banners list inside Sec… (elastic#80124)
  add missing await to fix test (elastic#80202)
  Revert test data changed in previous commit. (elastic#79479)
  [Security Solution] [Sourcerer] Jest beef up (elastic#79907)
  Re-enable transaction duration alert story (elastic#80187)
  [npm] remove canvas dep (elastic#80185)
  [DOCS] Redirects ILM docs to Elasticsearch reference (elastic#79798)
  [APM] Catch health status error from ML (elastic#80131)
  Fix layout and remove title for add alert popover. (elastic#77633)
  [Discover] Loading spinner cleanup (elastic#79819)
  [Security Solution] [Resolver] Remove related events api (elastic#79036)
  [Ingest Manager] Remove fields from index pattern during package uninstall (elastic#80082)
  do not refetch license if signature header absents from a response (elastic#79645)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0 v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Ingest Manager] Updating a package immediately after a rollover fails
7 participants