-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinterpret dots in field names as object structure #79922
Reinterpret dots in field names as object structure #79922
Conversation
Pinging @elastic/es-search (Team:Search) |
After this has been merged, handling dots in field names becomes much simpler, as we just check to see if the current parent object has |
assertEquals( | ||
"Cannot add a value for field [field.bar] since one of the intermediate objects is mapped as a nested object: [field]", | ||
e.getMessage() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice upside of this change is that we can now handle dynamically mapped fields with dots that lead into a nested object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XContentParser bits look sensible. Left a suggestion for an extra thing to test.
After looking through the DocumentParser bit it looks like I was misunderstanding what this was doing the last time I read it. I'm actually a big fan now. I think I'll need to give it another look on Monday though before I can decide if it's, like, right. I mean, if the tests pass that's a huge thing, because this is mostly about replacing tricky code with a sensible abstraction. But I'd like to give it a look with a more clearerer head.
assertEquals(XContentParser.Token.VALUE_STRING, parser.nextToken()); | ||
assertEquals("value2", parser.text()); | ||
assertEquals(XContentParser.Token.END_OBJECT, parser.nextToken()); | ||
assertNull(parser.nextToken()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to test this with copyCurrentStructure
and XContentHelper.convertToMap
- those might make the tests a bit more readable. And might find extra fun sneaky stuff.
Er. I'm off on Monday. Wednesday. Or, someone else who understand document parsing can have it. I'm not picky. |
@elasticmachine run elasticsearch-ci/part-1 |
* upstream/master: (29 commits) Fix typo (elastic#80925) Increase docker compose timeouts for CI builds TSDB: fix error without feature flag (elastic#80945) [DOCS] Relocate `index.mapping.dimension_fields.limit` setting docs (elastic#80964) Explicit write methods for always-missing values (elastic#80958) TSDB: move TimeSeriesModeIT to yaml tests (elastic#80933) [ML] Removing temporary debug (elastic#80956) Remove unused ConnectTransportException#node (elastic#80944) Reinterpret dots in field names as object structure (elastic#79922) Remove obsolete typed legacy index templates (elastic#80937) Remove unnecessary shuffle in unassigned shards allocation. (elastic#65172) TSDB: Tests for nanosecond timeprecision timestamp just beyond the limit (elastic#80932) Cleanup SLM History Item .equals (elastic#80938) Rework breaking changes for new structure (elastic#80907) [DOCS] Fix elasticsearch-reset-password typo (elastic#80919) [ML] No need to use parent task client when internal infer delegates (elastic#80905) Fix shadowed vars pt6 (elastic#80899) add ignore info (elastic#80924) Fix several potential circuit breaker leaks in Aggregators (elastic#79676) Extract more standard metadata from binary files (elastic#78754) ...
* upstream/master: (319 commits) Fix typo (elastic#80925) Increase docker compose timeouts for CI builds TSDB: fix error without feature flag (elastic#80945) [DOCS] Relocate `index.mapping.dimension_fields.limit` setting docs (elastic#80964) Explicit write methods for always-missing values (elastic#80958) TSDB: move TimeSeriesModeIT to yaml tests (elastic#80933) [ML] Removing temporary debug (elastic#80956) Remove unused ConnectTransportException#node (elastic#80944) Reinterpret dots in field names as object structure (elastic#79922) Remove obsolete typed legacy index templates (elastic#80937) Remove unnecessary shuffle in unassigned shards allocation. (elastic#65172) TSDB: Tests for nanosecond timeprecision timestamp just beyond the limit (elastic#80932) Cleanup SLM History Item .equals (elastic#80938) Rework breaking changes for new structure (elastic#80907) [DOCS] Fix elasticsearch-reset-password typo (elastic#80919) [ML] No need to use parent task client when internal infer delegates (elastic#80905) Fix shadowed vars pt6 (elastic#80899) add ignore info (elastic#80924) Fix several potential circuit breaker leaks in Aggregators (elastic#79676) Extract more standard metadata from binary files (elastic#78754) ...
Hi, Will this be ported also to 7.x versions? |
With elastic#79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The currentName method of such parser was not returning all the time the expected name, compared to the corresponding parser that receives as input the document with expanded fields. This commit expands testing and addresses the issues that were found.
With #79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The currentName method of such parser was not returning all the time the expected name, compared to the corresponding parser that receives as input the document with expanded fields. This commit expands testing and addresses the issues that were found.
With elastic#79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The token location exposed by such parser can be confusing to interpret: consumers are parsing the expanded version which requires jumping ahead reading tokens and exposing additional field names and start objects, while users have sent the unexpanded version and would like errors to refer to the original content. This commit adds a test for this scenario and tweaks the DotExpandingXContentParser to cache the token location before jumping ahead to expand dots in field names.
) With #79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The token location exposed by such parser can be confusing to interpret: consumers are parsing the expanded version which requires jumping ahead reading tokens and exposing additional field names and start objects, while users have sent the unexpanded version and would like errors to refer to the original content. This commit adds a test for this scenario and tweaks the DotExpandingXContentParser to cache the token location before jumping ahead to expand dots in field names.
…stic#84970) With elastic#79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The token location exposed by such parser can be confusing to interpret: consumers are parsing the expanded version which requires jumping ahead reading tokens and exposing additional field names and start objects, while users have sent the unexpanded version and would like errors to refer to the original content. This commit adds a test for this scenario and tweaks the DotExpandingXContentParser to cache the token location before jumping ahead to expand dots in field names.
) With #79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The token location exposed by such parser can be confusing to interpret: consumers are parsing the expanded version which requires jumping ahead reading tokens and exposing additional field names and start objects, while users have sent the unexpanded version and would like errors to refer to the original content. This commit adds a test for this scenario and tweaks the DotExpandingXContentParser to cache the token location before jumping ahead to expand dots in field names.
Previously, when using dynamic: false, an array field with a dot in its name, whose suffix matched a mapped field’s name, had its values merged with the mapped field unexpectedly. This has been fixed by elastic#79922 This commit adds a test for that scenario and verifies that the bug is fixed. Closes elastic#65333
Previously, when using dynamic: false, an array field with a dot in its name, whose suffix matched a mapped field’s name, had its values merged with the mapped field unexpectedly. This has been fixed by #79922 This commit adds a test for that scenario and verifies that the bug is fixed. Closes #65333
We changed how copy_to is implemented in #79922, which moved the handling of dots in field names into a specialised parser. Unfortunately, while doing this we added a bug whereby every time a copy_to directive is processed for a nested field, the nested field's include_in_parent logic would be run, meaning that the parent would end up with multiple copies of the nested child's fields. This commit fixes this by only running include_in_parent when the parser is not in a copy_to context. It also fixes another bug that meant the parent document would contain multiple copies of the ID field. Fixes #87036
We changed how copy_to is implemented in elastic#79922, which moved the handling of dots in field names into a specialised parser. Unfortunately, while doing this we added a bug whereby every time a copy_to directive is processed for a nested field, the nested field's include_in_parent logic would be run, meaning that the parent would end up with multiple copies of the nested child's fields. This commit fixes this by only running include_in_parent when the parser is not in a copy_to context. It also fixes another bug that meant the parent document would contain multiple copies of the ID field. Fixes elastic#87036
We changed how copy_to is implemented in #79922, which moved the handling of dots in field names into a specialised parser. Unfortunately, while doing this we added a bug whereby every time a copy_to directive is processed for a nested field, the nested field's include_in_parent logic would be run, meaning that the parent would end up with multiple copies of the nested child's fields. This commit fixes this by only running include_in_parent when the parser is not in a copy_to context. It also fixes another bug that meant the parent document would contain multiple copies of the ID field. Fixes #87036
DocumentParser parses documents by following their object hierarchy, and
using a parallel hierarchy of ObjectMappers to work out how to map leaf fields.
Field names that contain dots complicate this, meaning that many methods
need to reverse-engineer the object hierarchy to check that the current parent
object mapper is the correct one; this is particularly complex when objects
are being created dynamically.
To simplify this logic, this commit introduces a DotExpandingXContentParser,
which will wrap another XContentParser and re-interpret a field name containing
dots as a series of objects. So for example,
"foo.bar.baz":{ ... }
is passedto the DocumentParser as
"foo":{"bar":{"baz":{...}}}
. The central parsinglogic of DocumentParser detects when a fieldname has this form and temporarily
swaps out its context for a context with a wrapped parser.