[BUG] Searchable snapshot dependency on repository `chunk_size` #9676

andrross · 2023-08-31T20:57:39Z

Background

Every repository implementation accepts an optional chunk_size parameter at repository creation time. This property defines the maximum file size that will be uploaded to the repository. Any files larger than that will be broken into smaller files of chunk_size size (with the last chunk potentially smaller).

Searchable snapshots work by fetching partial index files on-demand at search time, and storing these parts as 8MiB (current hard-coded default) files on disk (unless the entire file is smaller than 8MiB). A virtual IndexInput wraps this logic, given the appearance to the upper layers of a single file while the implementation fetches and reads what is needed from these 8MiB partial files. Some clever logic in this code relies on the partial file size being a power of 2 to leverage bit shifting techniques to convert "block number" into an actual byte offset.

Bug

The searchable snapshot code does not handle the case that fetching an 8MiB section of a file might cross one of the snapshot chunk boundaries. This means that if the repository chunk_size parameter is not a multiple of 8MiB, then this code will fail. In practice, all default chunk sizes are in fact a multiple of 8MiB (fs: no chunking, s3: 1GiB, gcs: 5TiB, hdfs: no chunking). However a user can configure any value. This was discovered while attempting to implement #9514 and some repositories choose a random value between 100 and 1000 bytes for the test case.

Possible solution

Improve OnDemandBlockSnapshotIndexInput to always download the configured block size (i.e. 8MiB) even if that means downloading multiple file parts from the repository.

The text was updated successfully, but these errors were encountered:

kkmr · 2023-09-06T22:10:16Z

I'm looking into this

kotwanikunal · 2024-01-16T21:18:24Z

@kkmr Are you still looking at it?

andrross added bug Something isn't working untriaged distributed framework and removed untriaged labels Aug 31, 2023

andrross assigned kkmr Sep 6, 2023

anasalkouz added Search:Searchable Snapshots and removed distributed framework labels Sep 19, 2023

andrross mentioned this issue Jan 4, 2024

[Enhancement] Verify the chunk size of the snapshot when restoring a searchable snapshot #11741

Closed

8 tasks

bugmakerrrrrr mentioned this issue Jan 5, 2024

[Enhancement] Verify the chunk size of the snapshot when restoring a searchable snapshot #11739

Closed

kotwanikunal assigned Rishikesh1159 Jan 16, 2024

andrross unassigned kkmr Jan 18, 2024

Rishikesh1159 mentioned this issue Feb 9, 2024

[Searchable Snapshot] Fix bug of Searchable Snapshot Dependency on repository chunk_size #12277

Merged

8 tasks

Rishikesh1159 closed this as completed in #12277 Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Searchable snapshot dependency on repository `chunk_size` #9676

[BUG] Searchable snapshot dependency on repository `chunk_size` #9676

andrross commented Aug 31, 2023 •

edited

Loading

kkmr commented Sep 6, 2023

kotwanikunal commented Jan 16, 2024

[BUG] Searchable snapshot dependency on repository chunk_size #9676

[BUG] Searchable snapshot dependency on repository chunk_size #9676

Comments

andrross commented Aug 31, 2023 • edited Loading

Background

Bug

Possible solution

kkmr commented Sep 6, 2023

kotwanikunal commented Jan 16, 2024

[BUG] Searchable snapshot dependency on repository `chunk_size` #9676

[BUG] Searchable snapshot dependency on repository `chunk_size` #9676

andrross commented Aug 31, 2023 •

edited

Loading