Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats about bulk sizes #47345

Closed
jpountz opened this issue Oct 1, 2019 · 3 comments
Closed

Stats about bulk sizes #47345

jpountz opened this issue Oct 1, 2019 · 3 comments
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >feature Team:Data Management Meta label for data/management team

Comments

@jpountz
Copy link
Contributor

jpountz commented Oct 1, 2019

The size of bulk requests in bytes in one of the main factors of indexing performance. Yet we don't have any stats about it, and users don't always know what is the actual size of the bulk requests that get actually sent to Elasticsearch, which is something that is automated in many shippers. For instance Logstash allows to configure the size of batches as a number of documents, but the pipeline.batch.delay may cause incomplete batches to be sent, so it would be good to know what is the size of bulk requests that got sent to Elasticsearch in practice.

The size of bulk requests per shard is probably the most relevant metric, but we could work with the per-index size as well if that proves easier to expose.

@jpountz jpountz added >feature :Data Management/Stats Statistics tracking and retrieval APIs labels Oct 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jbaiera
Copy link
Member

jbaiera commented Oct 22, 2019

As a note here, it might make sense to track the average bulk sizes per shard as well as the average bulk request size, since a large bulk request may be chopped down into much smaller shard level bulk operation on an index with high numbers of shards. This makes more sense to me than just tracking at the shard level since most clients are not partitioning by shard already.

probakowski pushed a commit that referenced this issue Apr 20, 2020
* Add Bulk stats track the bulk sizes per shard and the time spent on the bulk shard request (#50536)(#47345)
probakowski pushed a commit to probakowski/elasticsearch that referenced this issue Apr 20, 2020
* Add Bulk stats track the bulk sizes per shard and the time spent on the bulk shard request (elastic#50536)(elastic#47345)
@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@dakrone
Copy link
Member

dakrone commented May 17, 2024

This has been open for quite a while, and we haven't made much progress on this due to focus in other areas. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@dakrone dakrone closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >feature Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

5 participants