Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrity checks for read-only indices #16162

Closed
jpountz opened this issue Jan 21, 2016 · 5 comments
Closed

Integrity checks for read-only indices #16162

jpountz opened this issue Jan 21, 2016 · 5 comments
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >feature help wanted adoptme Team:Data Management Meta label for data/management team

Comments

@jpountz
Copy link
Contributor

jpountz commented Jan 21, 2016

The checksum verification that we perform on merge or relocation has proved very useful to detect corruption but it is only triggered on merge. So static indices or segments that already reached the maximum segment size are never checked, even though they are as likely to get corrupted.

It would be nice to have an API, a command-line tool or even to figure out a way to run integrity checks automatically (anything that is easier than running CheckIndex on the shard directories)?

@clintongormley clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates help wanted adoptme and removed discuss labels Jan 22, 2016
@clintongormley
Copy link
Contributor

+1 - should probably integrate with the task management api

@martijnvg
Copy link
Member

When discussing whether the experimental tag should be removed from index.shard.check_on_startup index setting in #19798, we came to the conclusion that performing a checkindex when a shard is starting is maybe not the best time. It can slow down recovery significantly.

Instead like this issue suggested an api is better, because there is full control when a check index is performed. The api should mimic the checkindex tool as close as possible. Exposing its verbose, fast, segment and crossCheckTermVectors options:

POST /{index}/_check_index?verbose=true|false&fast=true|false&segement=[segment_id]&crossCheckTermVectors=true|false

The exorcise option should not be exposed as it modifies the index by removing segments that have issues.

The check index api should be executed on a closed index. Making this api work on open indices is tricky, because the underlying Lucene index can change while check index is running. However I think we can make this api work on read only indices.

The check index api should replace the index.shard.check_on_startup index setting.

@s1monw If I remember correctly you prefer a command line utility instead of an api for things like check index. However I think in this case check index I think an api is preferred? The current proposal doesn't modify shards and checking the integrity of an index via an api can be useful for curator like tools. Do you agree?

@hub-cap hub-cap removed the :Data Management/Indices APIs APIs to create and manage indices and templates label Mar 21, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@hub-cap hub-cap added the :Data Management/Indices APIs APIs to create and manage indices and templates label Mar 21, 2018
@s1monw
Copy link
Contributor

s1monw commented Mar 21, 2018

@s1monw If I remember correctly you prefer a command line utility instead of an api for things like check index. However I think in this case check index I think an api is preferred? The current proposal doesn't modify shards and checking the integrity of an index via an api can be useful for curator like tools. Do you agree?

I was suggesting a disk utility if you repair the index by dropping corrupted segments. I think for your use-case and API is just fine.

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@dakrone
Copy link
Member

dakrone commented May 8, 2024

This has been open for quite a while, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@dakrone dakrone closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >feature help wanted adoptme Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

8 participants