-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Searchable Snapshot] Propose API #2922
Comments
@andrross thinking about APIs, would it make sense to consider a new attach / detach concept for the index? I believe one of the primarily use cases will be not the creation of the index knowing ahead of time it will be remote but shelving the existing indices (fe by data age or any other compliance requirements) into the remote ones. The explicit attach API might be helpful when it the index in question is needed to be fully available for some prolonged period of time vs querying it in ad-hoc fashion (that would be solved by cache). Curious to hear what do you think. |
Listing out scenarios, with some rough API sketches: 1. Index stored only on instance storageThis is the status quo. Included here for completeness, but "false" is the implicit default.
2. Index stored on instance storage and is backed up to remoteThis is the work in flight. All index data is still completely stored on instance storage and queries are handled the same as they are today.
3. Index stored in remote storage, but is queryable and writeableThis is an evolution of the case above, where the index data is stored in the remote store and queried from remote store, which local cache for performance benefits. This scenario would require
4. Index stored in a snapshotIn this case the data is stored in a snapshot as they exist today. The index is not writeable. This is really meant as an intermediate goal as we ultimately build towards implementing scenario number 3.
In the fullness of time it should be possible to migrate indexes between scenarios 1, 2, and 3 as efficiently as possible in order to solve the "shelving existing indices" problem you describe. Scenario 4 may go away in the long term if the current snapshotting mechanism is ultimately replaced these new remote capabilities, but is being included as an incremental feature that can add value along the way. @reta What do you think? Did you have something specific in mind with attach/detach APIs? |
Thanks @andrross, I think my question could be rephrased like that: is it worth to have an explicit API (which I called attach / detach for the absence of better name for now) vs manipulating over index settings. Beside just clean semantics, explicit APIs allow to introduce dedicate security checks fe. |
Do we think we'd want to permission differently attaching a remote index? Feels like we are manipulating index properties (size, location, type, id), no? If we go with attach/detach, that should work for all indexes. |
Yeah, that's what led me to propose it this way. Also, in any case we'd want the ability to get the current properties of an index, so index settings made sense for that. If we had imperative-style APIs (attach/detach) we'd also need APIs to describe the current attached or detached state, so it's not obvious to me that these properties are different from regular index settings. |
I think what customer need is to create index backed by existing snapshot, which they will use the Create Index API? |
Snapshot restore is essentially an API for creating an index. Extending snapshot restore to create a searchable snapshot-type index makes a lot of sense, as a lot of the options in the API are still relevant (such as the renaming options, restoring partial snapshots, etc). Really the unresolved question here is whether there is a single API that can work with regular snapshots as well as the more forward looking option to create a remote index backed by the new remote store-type indexes that are being introduced. |
@andrross what is your strong opinion about what to go with? |
For the searchable snapshot phase of the storage roadmap, I think adding an option to the existing snapshot restore API is the right way to go for the reasons listed above. We're going to start with that approach for the initial development, but we can always change it based on feedback. There are still many open questions about the API and user experience for later phases in the storage roadmap (#3739) but I intend to continue that discussion in separate issues. For the purposes of the searchable snapshot API though, I'm going to go ahead and close this issue with the intent to get an experimental version behind a feature flag that implements "proposal 1". |
Cluster Configuration
A "remote searcher" is a concept that should have broad applicability beyond searching remote snapshots, as it defines a node capable of searching (and caching) indexes hosted in a remote store, potentially in any number of formats beyond a snapshot. This proposes to create a new node role to define such a capability:
Node Configuration
A new node role will be introduced:
remote_searcher
. This role indicates that the node is capable of hosting "remote" shards where the data is authoritatively stored in a remote store. The index data will not be permanently stored on the local instance storage. The local disk can be used as a cache. The cache size can be configured with the following setting inopensearch.yml
:Index creation API
Several options exist for the creating a remote index. This section details some of the options.
Proposal 1: Extend Snapshot Restore
A new parameter will be introduced in the snapshot restore API:
storage_type
.local
orremote_snapshot
.local
is the default if not specified, and indicates that all snapshot metadata and index data will be downloaded to local instance storage.remote_snapshot
indicates that snapshot metadata will be downloaded to the cluster but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured for theremote_searcher
role in order to restore a snapshot of typeremote_snapshot
.For example:
Proposal 2: Extend Create Index
New settings will be introduced to the create index API under the
index.remote
namespace.false
, indicating the index will exclusively use local instance storage. Iftrue
then theindex.remote.datastore
property must be specified.snapshot
is supported initiallyThe text was updated successfully, but these errors were encountered: