Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct dynamic updates directly via object builders #81449

Merged

Conversation

romseygeek
Copy link
Contributor

Currently, dynamic updates are built in the DocumentParser using a
stack of possibly-dynamic object mappers. This logic, spread across
a number of static methods, frequently assumes that the parents of
a mapper can be found by splitting its name on dots, an assumption
that will fail to hold once we allow objects to hold fields that have dots
in their names.

As a pre-requisite for the dots in field names effort, this commit refactors
the construction of dynamic updates into object mapper builders. Now,
to build an update, we start with a new dynamic root builder, and then
call addUpdate on it with each dynamically built mapper in turn. The
builder will examine the mapper and see if it can just add it to its own
set of mappers directly; and if not, it will retrieve or build an appropriate
intermediate object mapper and recursively call addUpdate on it with
the original mapper.

As a side-effect of this change, ObjectMapper itself no longer updates
its map of child mappers except during construction via merging, and so
we can safely replace CopyOnWriteHashMap here.

@romseygeek romseygeek added :Search Foundations/Mapping Index mappings, including merging and defining field types >refactoring v8.1.0 labels Dec 7, 2021
@romseygeek romseygeek self-assigned this Dec 7, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek
Copy link
Contributor Author

@elasticmachine update branch

@nik9000
Copy link
Member

nik9000 commented Dec 8, 2021

As a side-effect of this change, ObjectMapper itself no longer updates
its map of child mappers except during construction via merging, and so
we can safely replace CopyOnWriteHashMap here.

That sounds lovely.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass of this and left a few notes mostly for my own understanding. I didn't really go deep into the old logic in DocumentParser#createDynamicUpdate since it seems to be replaced by a completely different bottom-up-approach, but instead focused on that and the changes in the builders, which make sense to me so far. I still would like to give this a second pass and take another look at the test changes and would also appreciate a second pair of eyes here since this changes a lot of complex logic where I hope we cover most of the edge cases in tests already.

protected volatile Dynamic dynamic;

protected volatile CopyOnWriteHashMap<String, Mapper> mappers;
protected Map<String, Mapper> mappers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious to understand why this changed...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, probably because its not mutable any more now? Could the map be it be final then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, it gets updated in doMerge(). Which is a whole 'nother can of worms.

this.runtimeFields.put(runtimeField.name(), runtimeField);
return this;
}

public RootObjectMapper.Builder setRuntime(Map<String, RuntimeField> runtimeFields) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if this method now adds all input fields, could we rename it to "addRuntime" to make it clearer this doesn't overwrite the existing map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll update.

String fullChildName = prefix == null ? childName : prefix + "." + childName;
ObjectMapper.Builder childBuilder = findChild(childName, fullChildName, context);
childBuilder.addDynamic(name.substring(firstDotIndex + 1), fullChildName, mapper, context);
mappersBuilders.add(childBuilder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably confused by the recursion here, but would it make sense to add the new childBuilder before the recursive step? Or maybe it doesn't matter, I now see that "findChild" only looks up previous dynamic updates in 'context.getObjectMapper(fullChildName)' and that wouldn't be affected by the mappersBuilders here? Just a bit confused, maybe you can explain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, findChild doesn't actually consult the existing mappersBuilders at all. In fact, I think the final branch of findChild, where we just return a new builder, won't ever be called now, because of #79922 - a dynamically built mapper will always have a parent that is either in the existing mappings, or has also been dynamically added. Maybe worth keeping it as a safety fall-back though.

In terms of adding the childBuilder to mappersBuilders before/after the recursion, it doesn't actually matter. The recursive step knows nothing about its parent (apart from using the passed-in prefix to build a full child name for lookup purposes), and the childBuilder itself is mutable so we can add a new mapper to it either before or after it has been added to mappersBuilders, it's the same object with the same result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the thing the feels weird to me is that we're merging stuff from the tree of updates in the context and the tree under the object itself. It's, like, how you have to do it, I think. I left a note about the method itself below. Maybe I've found something? If I have maybe it'd give us something to pry on to simplify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite under the object itself, it's under a new, empty, objectbuilder that has been generated by the main object. I've tried to explain the recursion more fully in a comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, if I remove the third branch and throw an exception, it causes failures in tests that are checking how fields starting or ending with a dot behave. I'll see if I can pull field name validation out into a separate PR.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a second pass with a few more questions, other than that LGTM. Still worth checking from another point of view I think.

@@ -87,6 +86,64 @@ public Builder add(Mapper.Builder builder) {
return this;
}

public Builder addMappers(Map<String, Mapper> mappers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this can be package private, seems only be used by ObjectMapperMergeTests

b.endObject();
b.field("field", 10);
b.endObject();
}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your opinion, does changing these tests from directly testing DocumentParser.createDynamicUpdate() to parsing full json objects also change their coverage? As far as I understand, we previously sorted all mapper updates from a parse alphabetically (thus the sorting in the test setups). Now we don't need to do that anymore due to the new recursive strategy. However, I wonder if my re-ordering things in the json we might create a different update order and maybe different results/edge cases? I'm not sure this can happen so just wondering if that might be a concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference I think is if we have two incompatible dynamic mappers for the same field - let's say you have something that looks like this:

{ "foo" : { "bar" : 1 }, "foo.bar" : "baz" }

We'll generate two mappings for 'foo.bar' here, one long and one text+keyword. Previously they would have been merged together and thrown an exception; now one will overwrite the other, but we'll still get an exception when the document is re-parsed if the long update is seen second and therefore 'wins'. I think this is OK though?

This also opens up possibilities for automatic widening of dynamic types - we could in the future have some kind of merge detection that says if you have a long mapper and a text mapper, you should always accept the text mapper when you combine the two in a dynamic context.

@romseygeek
Copy link
Contributor Author

Another update, taking advantage of the changes in #82359.

This does now in fact cause a change in behaviour, due to #28948. Previously if you send in a document that looked like {"a.":{"b":{"c":"d"}}} you'd get an error about ambiguous object paths, but only if there wasn't already an object mapped called 'a' in your mappings. Due to how paths get split up and combined, the extra '.' on the end of a. would only matter when things were smooshed together to build a dynamic mapper; during parsing the trailing dots were stripped off by a call to String.split() and ignored. With the new way of building dynamic mappings, these trailing dots are always ignored - you'll only get an error if you send in something like {"a..b":{"c":"d"}}.

Given all the weird corner cases that #28948 causes I wonder if we should address it as part of this? It's an easy change to make in the document parsing code, although it may be a bit trickier as part of mapper parsing.

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

@romseygeek
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I still don't fully understand it, but I'm closer to understand it than before. And a lot closer to understanding it than the old code. And we have tests.

@romseygeek
Copy link
Contributor Author

@elasticmachine update branch

@romseygeek romseygeek merged commit 002f506 into elastic:master Jan 25, 2022
@romseygeek romseygeek deleted the mapper/dynamic-update-without-merging branch January 25, 2022 12:21
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jan 26, 2022
* upstream/master: (762 commits)
  [DOCS] Add note to that log4j customization is outside the support scope (elastic#82668)
  Batch Index Settings Update Requests (elastic#82896)
  [DOCS] Delete pipeline containing stored script (elastic#83102)
  Try again to fix changelog areas after reorg (elastic#83100)
  Bind to non-localhost for transport in some cases (elastic#82973)
  [DOCS] Reuse multi-level `join` warning (elastic#82976)
  Remove unnecessary CopyOnWriteHashMap class (elastic#83040)
  Adjust changelog categories after reorg (elastic#83087)
  [DOCS] Fix typo in `action.destructive_requires_name` breaking change (elastic#83085)
  Stack Monitoring: Add Enterprise Search monitoring index templates (elastic#82743)
  [DOCS] Fix stored script example snippet (elastic#83056)
  [DOCS] Re-add network traffic para to `term` query (elastic#83047)
  [DOCS] Rename example stored script (elastic#83054)
  [ML][DOCS] Add Trained model APIs to the REST APIs index (elastic#82791)
  [ML] Update running process when global calendar changes (elastic#83044)
  [Transform] Fix condition on which the transform stops processing buckets (elastic#82852)
  [DOCS] Fixes field names in ML sum functions. (elastic#83048)
  [ML] fix NLP tokenization never_split handling around punctuation (elastic#82982)
  Construct dynamic updates directly via object builders (elastic#81449)
  Emit trace.id into audit logs (elastic#82849)
  ...

# Conflicts:
#	client/rest-high-level/src/test/java/org/elasticsearch/client/IndicesClientIT.java
#	client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/ILMDocumentationIT.java
#	server/src/main/java/org/elasticsearch/action/admin/indices/rollover/Condition.java
#	server/src/test/java/org/elasticsearch/action/admin/indices/rollover/ConditionTests.java
#	x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/RolloverActionTests.java
#	x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/TimeseriesLifecycleTypeTests.java
#	x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/WaitForRolloverReadyStepTests.java
javanna added a commit to javanna/elasticsearch that referenced this pull request Jun 14, 2022
When we apply dynamic mappings, we iterate over all the dynamic mappers retrieved from the DocumentParserContext, which are registered while parsing the document, and for each field look up their parent (going back multiple levels if necessary), and add it to the dynamic mapping update with the added/modified sub-field.

Dynamic mappers that are iterated through consist of a flat set of mappers, which can be both object mappers or field mappers. Most times, the object mappers from such set have no sub-fields as they are mapped as a result of parsing a document where the object appears for the first time, which also has sub-fields that are going to be added to the set of dynamic mappers themselves once they are parsed.

There are scenarios though where a dynamic template matches an object, and defines its structure including its subobjects as well as sub-fields. In that case the dynamically mapped object may hold sub-fields as well as define non-default values for dynamic, enabled or subobjects. The described situation was not well covered in tests so far, and is currently affected by a bug introduced with elastic#81449. When an object mapper is dynamically mapped, it is added to the map of dynamic object mappers, which makes it discoverable for sub-fields that will need to be added to it later, as well as to the set of dynamic mappers so that it becomes part of the mappings in case the document being parsed defines no sub-fields for it. What is missing is to recursively add its sub-fields to the dynamic object mappers. As a result of missing this step, intermediate objects that were dynamically mapped are not made discoverable which causes a cryptic "Missing intermediate object" error.

This commit fixed the problem by recursively registering inner sub-objects to the dynamic mappers whenever an object mapper is added to the dynamic mappers. It also changes the "missing intermediate object" error to make it more evident that it's an internal error and not a user error: it is now an IllegalStateException instead of an IllegalArgumentException.

Closes elastic#87513
javanna added a commit that referenced this pull request Jun 17, 2022
…87622)

When we apply dynamic mappings, we iterate over all the dynamic mappers retrieved from the DocumentParserContext, which are registered while parsing the document, and for each field look up their parent (going back multiple levels if necessary), and add it to the dynamic mapping update with the added/modified sub-field.

Dynamic mappers that are iterated through consist of a flat set of mappers, which can be both object mappers or field mappers. Most times, the object mappers from such set have no sub-fields as they are mapped as a result of parsing a document where the object appears for the first time, which also has sub-fields that are going to be added to the set of dynamic mappers themselves once they are parsed.

There are scenarios though where a dynamic template matches an object, and defines its structure including its subobjects as well as sub-fields. In that case the dynamically mapped object may hold sub-fields as well as define non-default values for dynamic, enabled or subobjects. The described situation was not well covered in tests so far, and is currently affected by a bug introduced with #81449. When an object mapper is dynamically mapped, it is added to the map of dynamic object mappers, which makes it discoverable for sub-fields that will need to be added to it later, as well as to the set of dynamic mappers so that it becomes part of the mappings in case the document being parsed defines no sub-fields for it. What is missing is to recursively add its sub-fields to the dynamic object mappers. As a result of missing this step, intermediate objects that were dynamically mapped are not made discoverable which causes a cryptic "Missing intermediate object" error.

This commit fixed the problem by recursively registering inner sub-objects to the dynamic mappers whenever an object mapper is added to the dynamic mappers. It also changes the "missing intermediate object" error to make it more evident that it's an internal error and not a user error: it is now an IllegalStateException instead of an IllegalArgumentException.

Closes #87513
javanna added a commit to javanna/elasticsearch that referenced this pull request Jun 17, 2022
…lastic#87622)

When we apply dynamic mappings, we iterate over all the dynamic mappers retrieved from the DocumentParserContext, which are registered while parsing the document, and for each field look up their parent (going back multiple levels if necessary), and add it to the dynamic mapping update with the added/modified sub-field.

Dynamic mappers that are iterated through consist of a flat set of mappers, which can be both object mappers or field mappers. Most times, the object mappers from such set have no sub-fields as they are mapped as a result of parsing a document where the object appears for the first time, which also has sub-fields that are going to be added to the set of dynamic mappers themselves once they are parsed.

There are scenarios though where a dynamic template matches an object, and defines its structure including its subobjects as well as sub-fields. In that case the dynamically mapped object may hold sub-fields as well as define non-default values for dynamic, enabled or subobjects. The described situation was not well covered in tests so far, and is currently affected by a bug introduced with elastic#81449. When an object mapper is dynamically mapped, it is added to the map of dynamic object mappers, which makes it discoverable for sub-fields that will need to be added to it later, as well as to the set of dynamic mappers so that it becomes part of the mappings in case the document being parsed defines no sub-fields for it. What is missing is to recursively add its sub-fields to the dynamic object mappers. As a result of missing this step, intermediate objects that were dynamically mapped are not made discoverable which causes a cryptic "Missing intermediate object" error.

This commit fixed the problem by recursively registering inner sub-objects to the dynamic mappers whenever an object mapper is added to the dynamic mappers. It also changes the "missing intermediate object" error to make it more evident that it's an internal error and not a user error: it is now an IllegalStateException instead of an IllegalArgumentException.

Closes elastic#87513
javanna added a commit that referenced this pull request Jun 17, 2022
…87622)

When we apply dynamic mappings, we iterate over all the dynamic mappers retrieved from the DocumentParserContext, which are registered while parsing the document, and for each field look up their parent (going back multiple levels if necessary), and add it to the dynamic mapping update with the added/modified sub-field.

Dynamic mappers that are iterated through consist of a flat set of mappers, which can be both object mappers or field mappers. Most times, the object mappers from such set have no sub-fields as they are mapped as a result of parsing a document where the object appears for the first time, which also has sub-fields that are going to be added to the set of dynamic mappers themselves once they are parsed.

There are scenarios though where a dynamic template matches an object, and defines its structure including its subobjects as well as sub-fields. In that case the dynamically mapped object may hold sub-fields as well as define non-default values for dynamic, enabled or subobjects. The described situation was not well covered in tests so far, and is currently affected by a bug introduced with #81449. When an object mapper is dynamically mapped, it is added to the map of dynamic object mappers, which makes it discoverable for sub-fields that will need to be added to it later, as well as to the set of dynamic mappers so that it becomes part of the mappings in case the document being parsed defines no sub-fields for it. What is missing is to recursively add its sub-fields to the dynamic object mappers. As a result of missing this step, intermediate objects that were dynamically mapped are not made discoverable which causes a cryptic "Missing intermediate object" error.

This commit fixed the problem by recursively registering inner sub-objects to the dynamic mappers whenever an object mapper is added to the dynamic mappers. It also changes the "missing intermediate object" error to make it more evident that it's an internal error and not a user error: it is now an IllegalStateException instead of an IllegalArgumentException.

Closes #87513
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants