-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Segment Store] Add RemoteDirectory interface to copy segment files to/from remote store #3102
Conversation
fa3a260
to
9a6812d
Compare
❌ Gradle Check failure fa3a260fcd8fe5c27a8227545fe743a3339b4210 |
❌ Gradle Check failure 9a6812d9447c580d2b11deda20db9a09614ad5d9 |
Checking the failures. |
9a6812d
to
91ebb0f
Compare
❌ Gradle Check failure 91ebb0f57a39d402ef5957b61496503744e1e36a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sachin, this is a good start, few minor sugestions/clarifications
server/src/main/java/org/opensearch/index/store/RemoteDirectory.java
Outdated
Show resolved
Hide resolved
@Override | ||
public Lock obtainLock(String name) throws IOException { | ||
throw new UnsupportedOperationException(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to acquire a lock before segment uploads or can we support concurrent uploads as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as segments in the remote store are uploaded and downloaded for backup and restore part, I don't think locking is required. Once we start writing to remote store via IndexWriter, locking would be required.
Open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we extend BaseDirectory
, it will provide locking functionality with the provided LockFactory
.
@Override | ||
public void copyBytes(DataInput input, long numBytes) throws IOException { | ||
assert input instanceof IndexInput : "input should be instance of IndexInput"; | ||
blobContainer.writeBlob(getName(), new InputStreamIndexInput((IndexInput) input, numBytes), numBytes, false); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets say if we go with concurrent writes to local and remote store would we still need a copyBytes
or writeByte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also use ByteBuffer
for leveraging I/O optimizations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets say if we go with concurrent writes to local and remote store would we still need a
copyBytes
orwriteByte
For concurrent writes, we will need writeByte/s
Do we also use
ByteBuffer
for leveraging I/O optimizations?
Right. Current implementation may not be optimized. It also depends on how a given repository implementation is handling writeBlob
.
We will be having a continuous performance test setup which will provide insight into the performance issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case, let's say it's not concurrent, but we want writes to only happen on remote which can be directly searchable we might want to just do writeBytes
to remote directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the idea is to open up the methods as per the requirement. As you said, writeByte/writeBytes
methods will be required when IndexWriter
is using RemoteDirectory
to create the segments (concurrent writes or writing only to remote store).
public class RemoteIndexInput extends IndexInput { | ||
|
||
private final InputStream inputStream; | ||
private final long size; | ||
|
||
public RemoteIndexInput(String name, InputStream inputStream, long size) { | ||
super(name); | ||
this.inputStream = inputStream; | ||
this.size = size; | ||
} | ||
|
||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have a VerifyingIndexInput
we might want to consider that handles the checksums
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, either VerifyingIndexInput
or ChecksumIndexInput
can be used. I have added it as a ToDo item in the javadoc of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue is created for Checksum. Tagging this comment on the issue.
Signed-off-by: Sachin Kale <[email protected]>
91ebb0f
to
d11d190
Compare
You have failing tests here, do you still want to merge? (feature branch) |
This seem to be like a flaky test. There was a fix provided for it as a part of #2110. As this feature-branch was created before the fix, test is failing. Changes in this PR should not impact any existing tests. |
start gradle check |
start gradle check |
@sachinpkale @dblock I have rebased the feature branch from main and tried triggering a fresh build |
start gradle check |
2 similar comments
start gradle check |
start gradle check |
start gradle check try one more time. |
Checked the logs of last run. The failing test is: |
start gradle check |
Does this needs to be backported to |
@peterzhuamazon This would be a 3.0 change. |
…ore (opensearch-project#3102) Signed-off-by: Sachin Kale <[email protected]> Co-authored-by: Sachin Kale <[email protected]>
…ore (opensearch-project#3102) Signed-off-by: Sachin Kale <[email protected]> Co-authored-by: Sachin Kale <[email protected]>
Description
As described in the #2700, all the read/write operations to remote segment store will happen via
RemoteDirectory
interface. This change adds the RemoteDirectory interface along with correspondingRemoteIndexInput
(extends IndexInput) andRemoteIndexOutput
(extends IndexOutput) classes.Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.