Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked Encoding for NodeStatsResponse #90097

Merged

Conversation

original-brownbear
Copy link
Member

Turn this into a chunked response to some degree.
Only chunks per node for now, since deeper chunking needs larger changes downstream that don't fit in well with the current API.
The problem is that the "level" parameter that controls whether or not we return the very large indices or shard level responses is an x-content param so we don't have it when creating the iterator. I'd address this in a follow-up that changes the API a little.

As a result, I did not add a test here that validates the chunk count since I'd like to do more work on this anyway. I think it's a valuable change in its current form already and introduces a parent class that allows for turning other APIs into chunked encoding also.
For example, the indices level response for node stats in a 25k indices cluster across 6 data nodes is currently ~120M (and that is without pretty or human!). Without this change, each hit of the indices stats API will cause the coordinating node to allocate 120M for the response. With this change, we will only allocate ~20M for sending the same response. Serializing those 20M on the transport thread should be a non-issue from some quick benchmarking as even serializing the full 120M seems to well under one second.

relates #89838

Turn this into a chunked response to some degree.
Only chunks per node for now, since deeper chunking needs
larger changes downstream that don't fit in well with the
current API.
@original-brownbear original-brownbear added >non-issue :Distributed Coordination/Network Http and internode communication implementations v8.5.0 labels Sep 15, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 15, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Thanks David!

@Dongzhenpu
Copy link

Hi there, I found that after 8.5(the version that this commit was merged into), the nodes/stats api with level=abc query params will raise exception instead of return the error message in versions before 8.5

# version after 8.5
Error: socket hang up

# version before 8.5
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "level parameter must be one of [cluster] or [indices] or [shards] but was [abc]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "level parameter must be one of [cluster] or [indices] or [shards] but was [abc]",
    "suppressed" : [
      {
        "type" : "illegal_state_exception",
        "reason" : "Failed to close the XContentBuilder",
        "caused_by" : {
          "type" : "i_o_exception",
          "reason" : "Unclosed object or array found"
        }
      }
    ]
  },
  "status" : 400
}

I see Armin Braun said that

The problem is that the "level" parameter that controls whether or not we return the very large indices or shard level responses is an x-content param so we don't have it when creating the iterator. I'd address this in a follow-up that changes the API a little.

Is this an unexpected exception to this commit or is it a bug?

@DaveCTurner
Copy link
Contributor

I think this is a bug, although it's just one instance of a much more general problem. I opened #93981.

@original-brownbear original-brownbear restored the chunked-nodes-stats branch April 18, 2023 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants