Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement prototype remote store directory/index input for search #7431

Closed
yigithub opened this issue May 4, 2023 · 3 comments
Closed

Implement prototype remote store directory/index input for search #7431

yigithub opened this issue May 4, 2023 · 3 comments
Assignees

Comments

@yigithub
Copy link
Member

yigithub commented May 4, 2023

No description provided.

@yigithub yigithub converted this from a draft issue May 4, 2023
@neetikasinghal
Copy link
Contributor

neetikasinghal commented May 5, 2023

Issue description:
Searchable snapshots implemented a RemoteSnapshotDirectory that provides access to files that are physically represented as a snapshot in a repository (the specific repository implementation is provided via a storage plugin). This task is to create a similar remote search-focused Directory implementation for searching remote-backed indexes stored in a repository. The bulk of the logic for the Lucene abstraction that implements the on-demand fetching and file caching is implemented in the class OnDemandBlockIndexInput and will be reused here.

The remote-backed storage feature has implemented classes like RemoteSegmentStoreDirectory and RemoteIndexInput. There is some overlap as these classes must also understand the remote segment structure and metadata. Part of this prototype is to figure out how the pieces fit together, but particularly at the IndexInput level there will be distinct implementations as the remote search piece is built around on-demand partial file fetching, whereas the pieces used in the write path are designed to copy entire files to/and from the remote object store.

@neetikasinghal
Copy link
Contributor

neetikasinghal commented May 5, 2023

High-level Design

In the case of Searchable Snapshots, FileInfo class (part of BlobStoreIndexShardSnapshot class) contains the metadata for each segment/data file like file name, part size etc. The FileInfo object is used to make an object of BlobFetchRequest that is helpful in fetching/downloading the blocks of data from the remote store during the read path.
In case of Remote Search, we can leverage the UploadedSegmentMetadata class and use it to form the object of BlobFetchRequest similar to Searchable Snapshots.

In current flow of remote store upload, there is a listener registered for each refresh, that takes care of uploading/updating the metadata of each shard to the remote store. Current metadata is stored under a RemoteDirectory:

  • Location - <IndexUUID>/<Shard ID>/segments/metadata/
  • File Name - metadata__<Primary Term>__<Commit Generation>__<UUID>
  • Content in the file for every segment file uploaded at each commit: <OriginalSegmentFilename>::<UploadedSegmentFilename>::<Checksum>
  • Length of the file name

Create a new Directory - RemoteSearchDirectory and new IndexInput inheritor of OnDemandBlockIndexInput - OnDemandBlockSearchIndexInput

Initialization of the directory
The initialization of the Search based directory can be done at RemoteSearchDirectoryFactory , similar to the initialization of RemoteDirectory.

Upload/Update/Delete of metadata to the directory
Upload of metadata happens after every commit, update can happen after each refresh, at remote store.
The stale commit commit metadata files are deleted.

Content of metadata file
This contains UploadedSegmentMetadata of each of the segment file uploaded

Read/Loading the metadata file in the directory
OnDemandBlockSearchIndexInput class will be created to read the files in the directory newly created.

End to end flow for testing
create an index → remote backup → close the index → apply the remote search setting → open the index back → perform search on the index
Note: This testing setup will work only for the immutable indexes.

@neetikasinghal
Copy link
Contributor

Draft PR: #7417

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants