-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce experimental pass-through field type #103648
Conversation
`PassthoughObjectMapper` extends `ObjectMapper` to create a container for fields that also need to be referenced as if they were at the root level. This is done by creating aliases for all its subfields. It also supports an option of annotating all its subfields as dimensions. This will be leveraged in TSDB, where dimension fields can be dynamically defined as nested under a passthrough object - and still referenced directly (i.e. without prefixes) in aggregation queries. Related to elastic#103567
Hi @kkrik-es, I've created a changelog YAML for you. |
This looks great already 👏 Some ideas for potential improvements:
|
# Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/DynamicFieldsBuilder.java
@elasticsearchmachine run elasticsearch-ci/docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more comments.
...ata-streams/src/main/java/org/elasticsearch/datastreams/DataStreamIndexSettingsProvider.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java
Outdated
Show resolved
Hide resolved
/** | ||
* Are these field mappings being built dimensions? | ||
*/ | ||
public boolean contaisDimensions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to isParentFieldDimension()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm this won't be accurate.. Parent field would be a pass-through object that is not technically a dimension but can be marked to contain dimension fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to parentFieldContainsDimensions, wdyt?
if (internalMapper instanceof FieldMapper fieldMapper) { | ||
// If there's a conflicting alias with the same name at the root level, we don't want to throw an error | ||
// to avoid indexing disruption. | ||
// TODO: record an error without affecting document indexing, so that it can be investigated later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's log a warning for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a HeaderWarning, is this what you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like warning header as well. But I was actually thinking of logging a warning, the warning header isn't around when we do debugging. For example when we need to investigate why an alias points to the wrong field, this log warning could help us with debugging. Maybe also log a warning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, ptal.
server/src/main/java/org/elasticsearch/index/mapper/RootObjectMapper.java
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/index/mapper/TestDocumentParserContext.java
Outdated
Show resolved
Hide resolved
...ata-streams/src/main/java/org/elasticsearch/datastreams/DataStreamIndexSettingsProvider.java
Show resolved
Hide resolved
# Conflicts: # server/src/test/java/org/elasticsearch/index/mapper/RootObjectMapperTests.java
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I think we're almost there.
if (internalMapper instanceof FieldMapper fieldMapper) { | ||
// If there's a conflicting alias with the same name at the root level, we don't want to throw an error | ||
// to avoid indexing disruption. | ||
// TODO: record an error without affecting document indexing, so that it can be investigated later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like warning header as well. But I was actually thinking of logging a warning, the warning header isn't around when we do debugging. For example when we need to investigate why an alias points to the wrong field, this log warning could help us with debugging. Maybe also log a warning here?
...ata-streams/src/main/java/org/elasticsearch/datastreams/DataStreamIndexSettingsProvider.java
Show resolved
Hide resolved
...ata-streams/src/main/java/org/elasticsearch/datastreams/DataStreamIndexSettingsProvider.java
Show resolved
Hide resolved
* Are these field mappings being built dimensions? | ||
*/ | ||
public boolean parentFieldContainsDimensions() { | ||
return parentFieldContainsDimensions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for another naming comment :)
What about parentFieldEnablesDimension
? If time_series_dimension
is configured on a passthrough about then this automatically makes any number, keyword, ip field a dimension too. So a parent passthrough field essentially enabled the whether a field is a time series dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to parentObjectContainsDimension
, matching the containsDimension
param of PassThroughObject
. parentFieldEnablesDimension
is not accurate imho, this doesn't just enable but enforce defining subfields as dimensions.
server/src/main/java/org/elasticsearch/index/mapper/PassThroughObjectMapper.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I think the only improvement that we need before we can use this would be support at arbitrary levels, not only the root. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this post-merge and left some comments.
new TextFieldMapper.Builder(name, context.indexAnalyzers()).addMultiField( | ||
new KeywordFieldMapper.Builder("keyword", context.indexSettings().getIndexVersionCreated()).ignoreAbove(256) | ||
), | ||
context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how relevant is this change to the passthrough object mapper? This affects how fields are dynamically mapped and looks to me like a separate change. Also, I am not sure what the impact of it is for existing users. Namely, should the new behaviour only be activated for indices created from the version that the change was introduced in? Otherwise fields are mapped differently for the same index before and after the upgrade, which seems unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR indeed both introduces the passthrough mapper and includes changes to support its functionality for defining dimension fields underneath. It could definitely be split, but it did help with testing - and ensuring that the implementation addressed the requirements around dynamic dimension support.
Since parentObjectContainsDimensions()
is introduced in this PR and the else
branch is the old logic, I'd think this is bwc?
+ objBuilder.name() | ||
+ "], passthrough is not supported as a subobject" | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because the passthrough type can only be used under the root object? Does it mean it should rather become part of the RootObjectMapper explicitly then? This check within ObjectMapper is not great, because the base class needs to know of its own subclass impl? I think we should find a different way to perform this check. I also read that we want to support it in an arbitrary level in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! I think these comments were mostly addressed in #105062. Ptal, I'm happy to iterate on feedback there too.
Mapper.TypeParser typeParser = parserContext.typeParser(type); | ||
if (typeParser == null) { | ||
throw new MapperParsingException("No handler for type [" + type + "] declared on field [" + fieldName + "]"); | ||
} | ||
Mapper.Builder fieldBuilder; | ||
if (objBuilder.subobjects.value() == false) { | ||
if (objBuilder.subobjects.value() == false || type.equals(FieldAliasMapper.CONTENT_TYPE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this change required? Is this about "ObjectMapper was modified to use the full name for FieldAliasMappers"? Is it a bug around field aliases, or an enhancement around them? How does it affect existing users? I think that it's a change that should probably have been reviewed separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -106,19 +110,54 @@ public RootObjectMapper.Builder addRuntimeFields(Map<String, RuntimeField> runti | |||
|
|||
@Override | |||
public RootObjectMapper build(MapperBuilderContext context) { | |||
Map<String, Mapper> mappers = buildMappers(context); | |||
mappers.putAll(getAliasMappers(mappers, context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this way won't the field aliases become visible in the resulting mappings? Was that the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want the aliases to be visible, but maybe I misunderstood the question. What would be the alternative here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that the intention was to have the passthrough mapper, and field aliases hidden somehow in the resulting mappings. Maybe I misunderstood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that the alias field mappers would be visible in the mappings. That's because the get mapping request doesn't return a serialized version of the RootObjectMapper
. Instead, we're returning the source
as a blob, as it has been provided in the put mapping request.
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this functionality is misplaced. It should not be RootObjectMapper becoming aware of a specific object mapper subclass and special case on it. Ideally, this could be all embedded into the custom field mapper we are adding. I can see some challenges that don't make this directly possible with the existing infra: an object mapper cannot emit mapped field types, and a field mapper can only emit a single mapped field type. We do have more flexibility in the runtime section though. Yet I don't think we have a precedent for a mapper being able to emit fields that are outside of its own path (it's something we have discussed as part of the composite runtime field impl but have decided not to implement - meaning it can only produce fields that are part of its path).
If I am not mistaken, all we'd like is to expose additional runtime behaviour. Namely what the field alias mapper does is take the MappedFieldType registered at the target, and register it also with the path of the alias. Ideally, the passthrough mapper would be able to do just that, without special casing outside of it. If it could emit mapped field types, similar to how CompositeRuntimeField does through its asMappedFieldTypes method.
This may address two additional concerns I have with the current implementation:
- it relies on field aliases, which have some limitations, and I am thinking that if we could generate runtime fields or mapped field types instead of full-blown field alias mapper, that would give us more flexibility
- it subclasses ObjectMapper, which is maybe still necessary because of the need to accept sub-fields, but I'd like to consider moving away from it. We have been discussing for quite a while to not have subclasses of ObjectMapper, and trying to remove special casing all over the codebase. This PR introduces more special casing and reliance on ObjectMapper which we should reconsider.
This is a bit of a braindump, and not a complete solution yet, but something that we can take as a starting point to improve the current design.
Another idea, possibly less of a redesign, would be to move the instanceof check to MappingLookup/FieldTypeLookup where we already check for aliases and we special case them. That's where we could do what field aliases do, without explicitly adding field alias mappers to mappings (which I cannot tell if it's needed or more of a side-effect of the current impl). I am mentioning this as a stop-gap measure to improve things, as the redesign I have described above requires considerable changes and additional thinking, but I still think that would be a good discussion to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the long and detailed writeup, Luca.
@felixbarny @martijnvg fyi, my understanding is that it'd help users or Kibana to see both the nested objects under pass-through and the aliased fields at the root level. If this is not strictly needed, I can look into removing the aliases and update the MappingLookup/FieldTypeLookup logic instead, as suggested above.
Fwiw, I think we still need the Passthrough object as it provides a good solution to the problem of supporting dynamic dimensions in TSDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking of doing a special case in MappingLookup/FieldTypeLookup instead of adding individual alias field mappers to the root object mapper. IINM, field caps uses this internally, so we'd still be able have field suggestions for this in Kibana.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, I think we still need the Passthrough object as it provides a good solution to the problem of supporting dynamic dimensions in TSDB.
I agree and I didn't get the impression that Luca was questioning the existence of the new field type, just how it's implemented. For example, the passthrough field type doesn't necessarily need to extend ObjectMapper, especially since it doesn't support subobjects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, I see that field caps does indeed use FieldTypeLookup. If this suffices, it definitely looks cleaner to just update these classes and skip adding aliases at the root level (so many issues with the latter part already..).
I'll put together a PR for this first, then see if it's better to make PassThrough a standalone mapper instead of an ObjectMapper subclass.
PassThoughObjectMapper
extendsObjectMapper
to create a container for fields that also need to be referenced as if they were at the root level. This is done by creating aliases for all its subfields.It also supports an option of annotating all its subfields as dimensions. This will be leveraged in TSDB, where dimension fields can be dynamically defined as nested under a passthrough object - and still referenced directly (i.e. without prefixes) in aggregation queries. More so, the pass-through object is added as a path match to the routing path, so that dynamically-defined subfields get including in routing and
_tsid
calculations.The initial implementation is experimental, missing the following:
Related to #103567