[[parent-child-performance]] === Practical Considerations
Parent-child joins can be a useful technique for managing relationships when index-time performance((("parent-child relationship", "performance and"))) is more important than search-time performance, but it comes at a significant cost. Parent-child queries can be 5 to 10 times slower than the equivalent nested query!
==== Memory Use
At the time of going to press, the parent-child ID map is still held in
memory.((("parent-child relationship", "memory usage")))((("memory usage", "parent-child ID map"))) There are plans to change the map to use doc values instead, which
will be a big memory saving. Until that happens, you need to be aware of the
following: the string _id
field of every parent document has to be held in memory, and
every child document requires 8 bytes (a long value) of memory. Actually,
it's a bit less thanks to compression, but this gives you a rough idea.
You can check how much memory is being used by the parent-child cache by
consulting ((("indices-stats API")))the indices-stats
API (for a summary at the index level) or the
node-stats
API (for a summary at the node level):
<1> Returns memory use of the ID cache summarized by node in a human-friendly format.
==== Global Ordinals and Latency
Parent-child uses <<global-ordinals,global ordinals>> to speed((("global ordinals")))((("parent-child relationship", "global ordinals and latency"))) up joins. Regardless of whether the parent-child map uses an in-memory cache or on-disk doc values, global ordinals still need to be rebuilt after any change to the index.
The more parents in a shard, the longer global ordinals will take to build. Parent-child is best suited to situations where there are many children for each parent, rather than many parents and few children.
Global ordinals, by default, are built lazily: the first parent-child query or
aggregation after a refresh will trigger building of global ordinals. This
can introduce a significant latency spike for your users. You can use
<<eager-global-ordinals,eager_global_ordinals
>> to((("eager global ordinals"))) shift the cost of
building global ordinals from query time to refresh time, by mapping the
_parent
field as follows:
PUT /company { "mappings": { "branch": {}, "employee": { "_parent": { "type": "branch", "fielddata": { "loading": "eager_global_ordinals" <1> } } } } }
<1> Global ordinals for the _parent
field will be built before a new segment
becomes visible to search.
With many parents, global ordinals can take several seconds to build. In this
case, it makes sense to increase the refresh_interval
so((("refresh_interval setting"))) that refreshes
happen less often and global ordinals remain valid for longer. This will
greatly reduce the CPU cost of rebuilding global ordinals every second.
==== Multigenerations and Concluding Thoughts
The ability to join multiple generations (see <>) sounds attractive until ((("grandparents and grandchildren")))((("parent-child relationship", "multi-generations")))you think of the costs involved:
- The more joins you have, the worse performance will be.
- Each generation of parents needs to have their string
_id
fields stored in memory, which can consume a lot of RAM.
As you consider your relationship schemes and whether parent-child is right for you, consider this advice ((("parent-child relationship", "guidelines for using")))about parent-child relationships:
- Use parent-child relationships sparingly, and only when there are many more children than parents.
- Avoid using multiple parent-child joins in a single query.
- Avoid scoring by using the
has_child
filter, or thehas_child
query withscore_mode
set tonone
. - Keep the parent IDs short, so that they require less memory.
Above all: think about the other relationship techniques that we have discussed before reaching for parent-child.