From d0faf354c6fbbdf7e98807b8b5ec819466c72ec7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?=
 <istvan.szabo@elastic.co>
Date: Fri, 27 Sep 2019 10:34:30 +0200
Subject: [PATCH] [DOCS] Reformats Profile API (#47168)

* [DOCS] Reformats Profile API.

* [DOCS] Fixes failing docs test.
---
 docs/reference/search/profile.asciidoc | 332 +++++++++++++++----------
 1 file changed, 197 insertions(+), 135 deletions(-)

diff --git a/docs/reference/search/profile.asciidoc b/docs/reference/search/profile.asciidoc
index 0d83d29305210..0b959f87e0e84 100644
--- a/docs/reference/search/profile.asciidoc
+++ b/docs/reference/search/profile.asciidoc
@@ -1,20 +1,30 @@
 [[search-profile]]
 === Profile API
 
-WARNING:  The Profile API is a debugging tool and adds significant overhead to search execution.
+WARNING: The Profile API is a debugging tool and adds significant overhead to search execution.
 
-The Profile API provides detailed timing information about the execution of individual components
-in a search request.  It gives the user insight into how search requests are executed at a low level so that
-the user can understand why certain requests are slow, and take steps to improve them.
-Note that the Profile API, <<profile-limitations, amongst other things>>, doesn't measure
-network latency, time spent in the search fetch phase, time spent while the requests spends
-in queues or while merging shard responses on the coordinating node.
+Provides detailed timing information about the execution of individual 
+components in a search request.
 
-The output from the Profile API is *very* verbose, especially for complicated requests executed across
-many shards. Pretty-printing the response is recommended to help understand the output
 
-[float]
-==== Usage
+[[search-profile-api-desc]]
+==== {api-description-title}
+
+The Profile API gives the user insight into how search requests are executed at 
+a low level so that the user can understand why certain requests are slow, and 
+take steps to improve them. Note that the Profile API, 
+<<profile-limitations, amongst other things>>, doesn't measure network latency, 
+time spent in the search fetch phase, time spent while the requests spends in 
+queues or while merging shard responses on the coordinating node.
+
+The output from the Profile API is *very* verbose, especially for complicated 
+requests executed across many shards. Pretty-printing the response is 
+recommended to help understand the output.
+
+
+[[search-profile-api-example]]
+==== {api-examples-title}
+
 
 Any `_search` request can be profiled by adding a top-level `profile` parameter:
 
@@ -31,9 +41,10 @@ GET /twitter/_search
 // TEST[setup:twitter]
 
 <1> Setting the top-level `profile` parameter to `true` will enable profiling
-for the search
+for the search.
+
 
-This will yield the following result:
+The API returns the following result:
 
 [source,console-result]
 --------------------------------------------------
@@ -167,12 +178,13 @@ This will yield the following result:
 // TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
 // TESTRESPONSE[s/\[2aE02wS1R8q_QFnYu6vDVQ\]\[twitter\]\[0\]/$body.$_path/]
 
-<1> Search results are returned, but were omitted here for brevity
+<1> Search results are returned, but were omitted here for brevity.
 
-Even for a simple query, the response is relatively complicated.  Let's break it down piece-by-piece before moving
-to more complex examples.
+Even for a simple query, the response is relatively complicated.  Let's break it 
+down piece-by-piece before moving to more complex examples.
 
-First, the overall structure of the profile response is as follows:
+
+The overall structure of the profile response is as follows:
 
 [source,console-result]
 --------------------------------------------------
@@ -200,27 +212,37 @@ First, the overall structure of the profile response is as follows:
 // TESTRESPONSE[s/"query": \[...\]/"query": $body.$_path/]
 // TESTRESPONSE[s/"collector": \[...\]/"collector": $body.$_path/]
 // TESTRESPONSE[s/"aggregations": \[...\]/"aggregations": []/]
-<1> A profile is returned for each shard that participated in the response, and is identified
-by a unique ID
-<2> Each profile contains a section which holds details about the query execution
-<3> Each profile has a single time representing the cumulative rewrite time
-<4> Each profile also contains a section about the Lucene Collectors which run the search
-<5> Each profile contains a section which holds the details about the aggregation execution
-
-Because a search request may be executed against one or more shards in an index, and a search may cover
-one or more indices, the top level element in the profile response is an array of `shard` objects.
-Each shard object lists its `id` which uniquely identifies the shard.  The ID's format is
+<1> A profile is returned for each shard that participated in the response, and 
+is identified by a unique ID.
+<2> Each profile contains a section which holds details about the query 
+execution.
+<3> Each profile has a single time representing the cumulative rewrite time.
+<4> Each profile also contains a section about the Lucene Collectors which run 
+the search.
+<5> Each profile contains a section which holds the details about the 
+aggregation execution.
+
+Because a search request may be executed against one or more shards in an index, 
+and a search may cover one or more indices, the top level element in the profile 
+response is an array of `shard` objects. Each shard object lists its `id` which 
+uniquely identifies the shard.  The ID's format is 
 `[nodeID][indexName][shardID]`.
 
-The profile itself may consist of one or more "searches", where a search is a query executed against the underlying
-Lucene index.  Most search requests submitted by the user will only execute a single `search` against the Lucene index.
-But occasionally multiple searches will be executed, such as including a global aggregation (which needs to execute
-a secondary "match_all" query for the global context).
+The profile itself may consist of one or more "searches", where a search is a 
+query executed against the underlying Lucene index.  Most search requests 
+submitted by the user will only execute a single `search` against the Lucene 
+index. But occasionally multiple searches will be executed, such as including a 
+global aggregation (which needs to execute a secondary "match_all" query for the 
+global context).
 
 Inside each `search` object there will be two arrays of profiled information:
-a `query` array and a `collector` array.  Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc.
+a `query` array and a `collector` array.  Alongside the `search` object is an 
+`aggregations` object that contains the profile information for the 
+aggregations. In the future, more sections may be added, such as `suggest`, 
+`highlight`, etc.
 
-There will also be a `rewrite` metric showing the total time spent rewriting the query (in nanoseconds).
+There will also be a `rewrite` metric showing the total time spent rewriting the 
+query (in nanoseconds).
 
 NOTE: As with other statistics apis, the Profile API supports human readable outputs. This can be turned on by adding
 `?human=true` to the query string. In this case, the output contains the additional `time` field containing rounded,
@@ -245,10 +267,11 @@ the `advance` phase of that query is the cause, for example.
 [[query-section]]
 ===== `query` Section
 
-The `query` section contains detailed timing of the query tree executed by Lucene on a particular shard.
-The overall structure of this query tree will resemble your original Elasticsearch query, but may be slightly
-(or sometimes very) different.  It will also use similar but not always identical naming.  Using our previous
-`match` query example, let's analyze the `query` section:
+The `query` section contains detailed timing of the query tree executed by 
+Lucene on a particular shard. The overall structure of this query tree will 
+resemble your original Elasticsearch query, but may be slightly (or sometimes 
+very) different.  It will also use similar but not always identical naming. 
+Using our previous `match` query example, let's analyze the `query` section:
 
 [source,console-result]
 --------------------------------------------------
@@ -279,25 +302,30 @@ The overall structure of this query tree will resemble your original Elasticsear
 // TESTRESPONSE[s/]$/],"rewrite_time": $body.$_path, "collector": $body.$_path}], "aggregations": []}]}}/]
 // TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
 // TESTRESPONSE[s/"breakdown": \{...\}/"breakdown": $body.$_path/]
-<1> The breakdown timings are omitted for simplicity
+<1> The breakdown timings are omitted for simplicity.
 
-Based on the profile structure, we can see that our `match` query was rewritten by Lucene into a BooleanQuery with two
-clauses (both holding a TermQuery).  The `type` field displays the Lucene class name, and often aligns with
-the equivalent name in Elasticsearch.  The `description` field displays the Lucene explanation text for the query, and
-is made available to help differentiating between parts of your query (e.g. both `message:search` and `message:test`
-are TermQuery's and would appear identical otherwise.
+Based on the profile structure, we can see that our `match` query was rewritten 
+by Lucene into a BooleanQuery with two clauses (both holding a TermQuery).  The 
+`type` field displays the Lucene class name, and often aligns with the 
+equivalent name in Elasticsearch.  The `description` field displays the Lucene 
+explanation text for the query, and is made available to help differentiating 
+between parts of your query (e.g. both `message:search` and `message:test` are 
+TermQuery's and would appear identical otherwise.
 
-The `time_in_nanos` field shows that this query took ~1.8ms for the entire BooleanQuery to execute.  The recorded time is inclusive
-of all children.
+The `time_in_nanos` field shows that this query took ~1.8ms for the entire 
+BooleanQuery to execute.  The recorded time is inclusive of all children.
 
-The `breakdown` field will give detailed stats about how the time was spent, we'll look at
-that in a moment.  Finally, the `children` array lists any sub-queries that may be present.  Because we searched for two
-values ("search test"), our BooleanQuery holds two children TermQueries.  They have identical information (type, time,
-breakdown, etc).  Children are allowed to have their own children.
+The `breakdown` field will give detailed stats about how the time was spent, 
+we'll look at that in a moment.  Finally, the `children` array lists any 
+sub-queries that may be present.  Because we searched for two values ("search 
+test"), our BooleanQuery holds two children TermQueries.  They have identical 
+information (type, time, breakdown, etc).  Children are allowed to have their 
+own children.
 
-====== Timing Breakdown
+===== Timing Breakdown
 
-The `breakdown` component lists detailed timing statistics about low-level Lucene execution:
+The `breakdown` component lists detailed timing statistics about low-level 
+Lucene execution:
 
 [source,console-result]
 --------------------------------------------------
@@ -326,10 +354,12 @@ The `breakdown` component lists detailed timing statistics about low-level Lucen
 // TESTRESPONSE[s/}$/},\n"children": $body.$_path}],\n"rewrite_time": $body.$_path, "collector": $body.$_path}], "aggregations": []}]}}/]
 // TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
 
-Timings are listed in wall-clock nanoseconds and are not normalized at all.  All caveats about the overall
-`time_in_nanos` apply here.  The intention of the breakdown is to give you a feel for A) what machinery in Lucene is
-actually eating time, and B) the magnitude of differences in times between the various components.  Like the overall time,
-the breakdown is inclusive of all children times.
+Timings are listed in wall-clock nanoseconds and are not normalized at all. All 
+caveats about the overall `time_in_nanos` apply here.  The intention of the 
+breakdown is to give you a feel for A) what machinery in Lucene is actually 
+eating time, and B) the magnitude of differences in times between the various 
+components. Like the overall time, the breakdown is inclusive of all children 
+times.
 
 The meaning of the stats are as follows:
 
@@ -399,13 +429,15 @@ The meaning of the stats are as follows:
     means the `nextDoc()` method was called on two different documents.  This can be used to help judge
     how selective queries are, by comparing counts between different query components.
 
+
 [[collectors-section]]
 ===== `collectors` Section
 
-The Collectors portion of the response shows high-level execution details. Lucene works by defining a "Collector"
-which is responsible for coordinating the traversal, scoring, and collection of matching documents.  Collectors
-are also how a single query can record aggregation results, execute unscoped "global" queries, execute post-query
-filters, etc.
+The Collectors portion of the response shows high-level execution details. 
+Lucene works by defining a "Collector" which is responsible for coordinating the 
+traversal, scoring, and collection of matching documents.  Collectors are also 
+how a single query can record aggregation results, execute unscoped "global" 
+queries, execute post-query filters, etc.
 
 Looking at the previous example:
 
@@ -430,15 +462,20 @@ Looking at the previous example:
 // TESTRESPONSE[s/]$/]}], "aggregations": []}]}}/]
 // TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
 
-We see a single collector named `SimpleTopScoreDocCollector` wrapped into `CancellableCollector`. `SimpleTopScoreDocCollector` is the default "scoring and sorting"
-`Collector` used by Elasticsearch.  The `reason` field attempts to give a plain English description of the class name.  The
-`time_in_nanos` is similar to the time in the Query tree: a wall-clock time inclusive of all children.  Similarly, `children` lists
-all sub-collectors. The `CancellableCollector` that wraps `SimpleTopScoreDocCollector` is used by Elasticsearch to detect if the current
-search was cancelled and stop collecting documents as soon as it occurs.
 
-It should be noted that Collector times are **independent** from the Query times.  They are calculated, combined,
-and normalized independently!  Due to the nature of Lucene's execution, it is impossible to "merge" the times
-from the Collectors into the Query section, so they are displayed in separate portions.
+We see a single collector named `SimpleTopScoreDocCollector` wrapped into 
+`CancellableCollector`. `SimpleTopScoreDocCollector` is the default "scoring and 
+sorting" `Collector` used by {es}. The `reason` field attempts to give a plain 
+English description of the class name.  The `time_in_nanos` is similar to the 
+time in the Query tree: a wall-clock time inclusive of all children. Similarly, 
+`children` lists all sub-collectors. The `CancellableCollector` that wraps 
+`SimpleTopScoreDocCollector` is used by {es} to detect if the current search was 
+cancelled and stop collecting documents as soon as it occurs.
+
+It should be noted that Collector times are **independent** from the Query 
+times.  They are calculated, combined, and normalized independently! Due to the 
+nature of Lucene's execution, it is impossible to "merge" the times from the 
+Collectors into the Query section, so they are displayed in separate portions.
 
 For reference, the various collector reasons are:
 
@@ -489,20 +526,22 @@ For reference, the various collector reasons are:
 [[rewrite-section]]
 ===== `rewrite` Section
 
-All queries in Lucene undergo a "rewriting" process.  A query (and its sub-queries) may be rewritten one or
-more times, and the process continues until the query stops changing.  This process allows Lucene to perform
-optimizations, such as removing redundant clauses, replacing one query for a more efficient execution path,
-etc.  For example a Boolean -> Boolean -> TermQuery can be rewritten to a TermQuery, because all the Booleans
-are unnecessary in this case.
+All queries in Lucene undergo a "rewriting" process.  A query (and its 
+sub-queries) may be rewritten one or more times, and the process continues until 
+the query stops changing.  This process allows Lucene to perform optimizations, 
+such as removing redundant clauses, replacing one query for a more efficient 
+execution path, etc. For example a Boolean -> Boolean -> TermQuery can be 
+rewritten to a TermQuery, because all the Booleans are unnecessary in this case.
 
-The rewriting process is complex and difficult to display, since queries can change drastically.  Rather than
-showing the intermediate results, the total rewrite time is simply displayed as a value (in nanoseconds).  This
-value is cumulative and contains the total time for all queries being rewritten.
+The rewriting process is complex and difficult to display, since queries can 
+change drastically. Rather than showing the intermediate results, the total 
+rewrite time is simply displayed as a value (in nanoseconds). This value is 
+cumulative and contains the total time for all queries being rewritten.
 
 ===== A more complex example
 
-
-To demonstrate a slightly more complex query and the associated results, we can profile the following query:
+To demonstrate a slightly more complex query and the associated results, we can 
+profile the following query:
 
 [source,console]
 --------------------------------------------------
@@ -550,7 +589,8 @@ This example has:
 - A global aggregation
 - A post_filter
 
-And the response:
+
+The API returns the following result:
 
 [source,console-result]
 --------------------------------------------------
@@ -660,47 +700,58 @@ And the response:
 // TESTRESPONSE[s/\.\.\.//]
 // TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
 // TESTRESPONSE[s/"id": "\[P6-vulHtQRWuD4YnubWb7A\]\[test\]\[0\]"/"id": $body.profile.shards.0.id/]
-<1> The `"aggregations"` portion has been omitted because it will be covered in the next section
+<1> The `"aggregations"` portion has been omitted because it will be covered in 
+the next section.
 
-As you can see, the output is significantly more verbose than before.  All the major portions of the query are
-represented:
+As you can see, the output is significantly more verbose than before.  All the 
+major portions of the query are represented:
 
-1. The first `TermQuery` (user:test) represents the main `term` query
-2. The second `TermQuery` (message:some) represents the `post_filter` query
+1. The first `TermQuery` (user:test) represents the main `term` query.
+2. The second `TermQuery` (message:some) represents the `post_filter` query.
 
-The Collector tree is fairly straightforward, showing how a single CancellableCollector wraps a MultiCollector
- which also wraps a FilteredCollector to execute the post_filter (and in turn wraps the normal scoring SimpleCollector),
- a BucketCollector to run all scoped aggregations.
+The Collector tree is fairly straightforward, showing how a single 
+CancellableCollector wraps a MultiCollector which also wraps a FilteredCollector 
+to execute the post_filter (and in turn wraps the normal scoring 
+SimpleCollector), a BucketCollector to run all scoped aggregations.
 
 ===== Understanding MultiTermQuery output
 
-A special note needs to be made about the `MultiTermQuery` class of queries.  This includes wildcards, regex, and fuzzy
-queries.  These queries emit very verbose responses, and are not overly structured.
-
-Essentially, these queries rewrite themselves on a per-segment basis.  If you imagine the wildcard query `b*`, it technically
-can match any token that begins with the letter "b".  It would be impossible to enumerate all possible combinations,
-so Lucene rewrites the query in context of the segment being evaluated, e.g., one segment may contain the tokens
-`[bar, baz]`, so the query rewrites to a BooleanQuery combination of "bar" and "baz".  Another segment may only have the
-token `[bakery]`, so the query rewrites to a single TermQuery for "bakery".
-
-Due to this dynamic, per-segment rewriting, the clean tree structure becomes distorted and no longer follows a clean
-"lineage" showing how one query rewrites into the next.  At present time, all we can do is apologize, and suggest you
-collapse the details for that query's children if it is too confusing.  Luckily, all the timing statistics are correct,
-just not the physical layout in the response, so it is sufficient to just analyze the top-level MultiTermQuery and
+A special note needs to be made about the `MultiTermQuery` class of queries. 
+This includes wildcards, regex, and fuzzy queries. These queries emit very 
+verbose responses, and are not overly structured.
+
+Essentially, these queries rewrite themselves on a per-segment basis. If you 
+imagine the wildcard query `b*`, it technically can match any token that begins 
+with the letter "b".  It would be impossible to enumerate all possible 
+combinations, so Lucene rewrites the query in context of the segment being 
+evaluated, e.g., one segment may contain the tokens `[bar, baz]`, so the query 
+rewrites to a BooleanQuery combination of "bar" and "baz".  Another segment may 
+only have the token `[bakery]`, so the query rewrites to a single TermQuery for 
+"bakery".
+
+Due to this dynamic, per-segment rewriting, the clean tree structure becomes 
+distorted and no longer follows a clean "lineage" showing how one query rewrites 
+into the next.  At present time, all we can do is apologize, and suggest you 
+collapse the details for that query's children if it is too confusing. Luckily, 
+all the timing statistics are correct, just not the physical layout in the 
+response, so it is sufficient to just analyze the top-level MultiTermQuery and
 ignore its children if you find the details too tricky to interpret.
 
-Hopefully this will be fixed in future iterations, but it is a tricky problem to solve and still in-progress :)
+Hopefully this will be fixed in future iterations, but it is a tricky problem to 
+solve and still in-progress. :)
 
 [[profiling-aggregations]]
-==== Profiling Aggregations
+===== Profiling Aggregations
+
 
 [[agg-section]]
-===== `aggregations` Section
+====== `aggregations` Section
 
 
-The `aggregations` section contains detailed timing of the aggregation tree executed by a particular shard.
-The overall structure of this aggregation tree will resemble your original Elasticsearch request.  Let's
-execute the previous query again and look at the aggregation profile this time:
+The `aggregations` section contains detailed timing of the aggregation tree 
+executed by a particular shard. The overall structure of this aggregation tree 
+will resemble your original {es} request. Let's execute the previous query again 
+and look at the aggregation profile this time:
 
 [source,console]
 --------------------------------------------------
@@ -741,6 +792,7 @@ GET /twitter/_search
 // TEST[s/_search/_search\?filter_path=profile.shards.aggregations/]
 // TEST[continued]
 
+
 This yields the following aggregation profile output:
 
 [source,console-result]
@@ -807,16 +859,20 @@ This yields the following aggregation profile output:
 // TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
 // TESTRESPONSE[s/"id": "\[P6-vulHtQRWuD4YnubWb7A\]\[test\]\[0\]"/"id": $body.profile.shards.0.id/]
 
-From the profile structure we can see that the `my_scoped_agg` is internally being run as a `LongTermsAggregator` (because the field it is
-aggregating, `likes`, is a numeric field).  At the same level, we see a `GlobalAggregator` which comes from `my_global_agg`.  That
-aggregation then has a child `LongTermsAggregator` which comes from the second term's aggregation on `likes`.
+From the profile structure we can see that the `my_scoped_agg` is internally 
+being run as a `LongTermsAggregator` (because the field it is aggregating, 
+`likes`, is a numeric field). At the same level, we see a `GlobalAggregator` 
+which comes from `my_global_agg`. That aggregation then has a child 
+`LongTermsAggregator` which comes from the second term's aggregation on `likes`.
 
-The `time_in_nanos` field shows the time executed by each aggregation, and is inclusive of all children.  While the overall time is useful,
-the `breakdown` field will give detailed stats about how the time was spent.
+The `time_in_nanos` field shows the time executed by each aggregation, and is 
+inclusive of all children.  While the overall time is useful, the `breakdown` 
+field will give detailed stats about how the time was spent.
 
-====== Timing Breakdown
+===== Timing Breakdown
 
-The `breakdown` component lists detailed timing statistics about low-level Lucene execution:
+The `breakdown` component lists detailed timing statistics about low-level 
+Lucene execution:
 
 [source,js]
 --------------------------------------------------
@@ -834,10 +890,11 @@ The `breakdown` component lists detailed timing statistics about low-level Lucen
 --------------------------------------------------
 // NOTCONSOLE
 
-Timings are listed in wall-clock nanoseconds and are not normalized at all.  All caveats about the overall
-`time` apply here.  The intention of the breakdown is to give you a feel for A) what machinery in Elasticsearch is
-actually eating time, and B) the magnitude of differences in times between the various components.  Like the overall time,
-the breakdown is inclusive of all children times.
+Timings are listed in wall-clock nanoseconds and are not normalized at all. All 
+caveats about the overall `time` apply here.  The intention of the breakdown is 
+to give you a feel for A) what machinery in {es} is actually eating time, and B) 
+the magnitude of differences in times between the various components. Like the 
+overall time, the breakdown is inclusive of all children times.
 
 The meaning of the stats are as follows:
 
@@ -866,28 +923,33 @@ The meaning of the stats are as follows:
     means the `collect()` method was called on two different documents.
 
 [[profiling-considerations]]
-==== Profiling Considerations
-
-===== Performance Notes
+===== Profiling Considerations
 
-Like any profiler, the Profile API introduces a non-negligible overhead to search execution.  The act of instrumenting
-low-level method calls such as `collect`, `advance`, and `next_doc` can be fairly expensive, since these methods are called
-in tight loops.  Therefore, profiling should not be enabled in production settings by default, and should not
-be compared against non-profiled query times.  Profiling is just a diagnostic tool.
+Like any profiler, the Profile API introduces a non-negligible overhead to 
+search execution. The act of instrumenting low-level method calls such as 
+`collect`, `advance`, and `next_doc` can be fairly expensive, since these 
+methods are called in tight loops.  Therefore, profiling should not be enabled 
+in production settings by default, and should not be compared against 
+non-profiled query times. Profiling is just a diagnostic tool.
 
-There are also cases where special Lucene optimizations are disabled, since they are not amenable to profiling.  This
-could cause some queries to report larger relative times than their non-profiled counterparts, but in general should
-not have a drastic effect compared to other components in the profiled query.
+There are also cases where special Lucene optimizations are disabled, since they 
+are not amenable to profiling. This could cause some queries to report larger 
+relative times than their non-profiled counterparts, but in general should not 
+have a drastic effect compared to other components in the profiled query.
 
 [[profile-limitations]]
 ===== Limitations
 
-- Profiling currently does not measure the search fetch phase nor the network overhead
-- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node, or
-additional work such as building global ordinals (an internal data structure used to speed up search)
-- Profiling statistics are currently not available for suggestions, highlighting, `dfs_query_then_fetch`
-- Profiling of the reduce phase of aggregation is currently not available
-- The Profiler is still highly experimental. The Profiler is instrumenting parts of Lucene that were
-never designed to be exposed in this manner, and so all results should be viewed as a best effort to provide detailed
-diagnostics.  We hope to improve this over time. If you find obviously wrong numbers, strange query structures, or
-other bugs, please report them!
+- Profiling currently does not measure the search fetch phase nor the network 
+overhead.
+- Profiling also does not account for time spent in the queue, merging shard 
+responses on the coordinating node, or additional work such as building global 
+ordinals (an internal data structure used to speed up search).
+- Profiling statistics are currently not available for suggestions, 
+highlighting, `dfs_query_then_fetch`.
+- Profiling of the reduce phase of aggregation is currently not available.
+- The Profiler is still highly experimental. The Profiler is instrumenting parts 
+of Lucene that were never designed to be exposed in this manner, and so all 
+results should be viewed as a best effort to provide detailed diagnostics. We 
+hope to improve this over time. If you find obviously wrong numbers, strange 
+query structures, or other bugs, please report them!