Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle index.mapping.ignore_malformed in downsampling #119134

Merged
merged 4 commits into from
Dec 23, 2024

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Dec 19, 2024

Currently ignore_malformed parameter of @timestamp field is not propagated to downsampled index. This is problematic when index.mapping.ignore_malformed is set to true because it leads to a validation error for data streams (ignoring timestamp does not make logical sense in a data stream). This PR fixes it.

Additionally we need to set index.mapping.ignore_malformed index setting earlier in the downsampling logic. Currently it is applied after the index is created with initial mapping. This leads to hitting this assert

assert mapping.toCompressedXContent().equals(source) || isSyntheticSourceMalformed(source, version)
When index.mapping.ignore_malformed is not set it defaults to false which matches the value of ignore_malformed on @timestamp and so ignore_malformed is not serialized in mapping but it is present in mapping source.

Fixes #119075.

@lkts lkts added >bug auto-backport Automatically create backport pull requests when merged :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data v8.17.1 v8.18.0 labels Dec 19, 2024
@lkts lkts requested review from martijnvg and kkrik-es December 19, 2024 19:14
@lkts lkts changed the title Handle index.mapping.ignore_malformed in downsampling Handle index.mapping.ignore_malformed in downsampling Dec 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.


if (randomBoolean()) {
settings.put(IndexMetadata.SETTING_INDEX_HIDDEN, randomBoolean());
}

XContentBuilder mapping = jsonBuilder().startObject().startObject("_doc").startObject("properties");
mapping.startObject(FIELD_TIMESTAMP).field("type", "date").endObject();
mapping.startObject(FIELD_TIMESTAMP).field("type", "date").field("ignore_malformed", false).endObject();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be set when randomBoolean( ) above is also set (and left unset otherwise), for coverage.

FieldMapper.IGNORE_MALFORMED_SETTING.getKey(),
sourceIndexMetadata.getSettings().get(FieldMapper.IGNORE_MALFORMED_SETTING.getKey())
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat puzzled by this, so how does @timestamp pick up ignore_above:true if this is not set? I assume it's passed somewhere but not everywhere..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't fully understand this. Maybe the setting came from an index template?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two sets of index settings:

  1. Settings applied when index is created (what you are looking at).
  2. Setting applied after the index is created - everything from the original index is copied (except settings in 1).

I have to admit i don't understand the reason for the split but this is the code structure. FieldMapper.IGNORE_MALFORMED_SETTING was in 2 but now is in 1. The reason is that by the time 2 is applied it's too late to "fix" mapping of @timestamp with it in mind.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@kkrik-es
Copy link
Contributor

Backport to 8.16.x, too?

@martijnvg
Copy link
Member

Backport to 8.16.x, too?

No more 8.16 releases will happen, so back porting this to 8.x and 8.17 branches is sufficient.

@lkts lkts merged commit c16bcb6 into elastic:main Dec 23, 2024
16 checks passed
@lkts lkts deleted the fix_downsampling_timestamp branch December 23, 2024 18:48
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.17
8.x

lkts added a commit to lkts/elasticsearch that referenced this pull request Dec 23, 2024
lkts added a commit to lkts/elasticsearch that referenced this pull request Dec 23, 2024
@kkrik-es
Copy link
Contributor

No more 8.16 releases will happen, so back porting this to 8.x and 8.17 branches is sufficient.

I see a FFreeze 8.16.3 on the cal?

@martijnvg
Copy link
Member

I see a FFreeze 8.16.3 on the cal?

I don't see a dev issue for it and given that 8.17.0 has been released, it means no 8.16.x releases will happen.

navarone-feekery pushed a commit to navarone-feekery/elasticsearch that referenced this pull request Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v8.17.1 v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Downsampling fails for indexes configured with [index.mapping.ignore_malformed:true]
4 participants