-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Access API #889
Comments
Mixin interface is awkward given we favour building decoratora elsewhere. Eg logging timing etc |
I've learned to dislike proto or thrift methods using primitives for results. If you had two results to return you'd have a return type. Which could then be changed to 3-4 results without API breaking. Also from bytebuffer use cases with absolute and relative cases, its really awkward with setting position and then resetting. Why didn't you consider using an optional nullable cursor approach? Cursor could be closeable and maybe optionally writeable? Could also be useful if you design for a remote API like s3? |
On scope - useful to be clear what is in an out of scope. From description above S3 out of scope. Local memory or failed including compressed would be in scope. Make this clear in the type system. LocalCursor vs a field position on a very general API like Source. Scope becomes simpler if this is a peer of Source, not part of it. |
How does it sit with Source. a) 1:1, it is the instance of the Source. (current suggestion) I'm assuming a) makes sense because Source could come from a non cloneable InputStream. |
On API usage - ByteBuffer is the existing JDK API that feels related. It makes it really awkward with positions, flipping, relative and absolute methods. In contrast the recent read and write batch APIs from Okio filesystem worked really well within a session. Can we show some live examples from OkHttp, or Moshi to demonstrate why this API will be simpler for users. |
@yschimke I love your recommendation of using an object, not a primitive. It fixes many problems! We can make it clear when random access is not available – the cursor is null! And it creates a convenient place to add helper things like Here’s a new API sketch that uses the name
The motivating use case for all of this is
Here’s a quick, dirty sketch of how that might look. |
Also can you pull out a readonly filesystem base class? Can use it for a ClasspathFilesystem. |
@swankjesse looks good, I think there are still open questions about whether cursor is 1:1 or 1:n. Can some methods take a nullable Cursor as a parameter to become relative to the cursor instead of Source? Are there things to learn from the ByteBuffer API, like limits? So the Zip index telling you a range within the file for the cursor? Fine with no, but worth defining the meaning. |
This turned out well for read-only, terribly for read-write. Ultimately we went with FileHandle instead. |
Goal
An API for reading files such that accesses aren’t strictly sequential.
Use Cases
Like
Buffer.UnsafeCursor
, I expect that random access is a power feature that most users shouldn’t need. Instead, it is a baseline for building higher-level abstractions likeZipFileSystem
.API
Implementation
FileSystem.SYTEM
andFakeFileSystem
.ForwardingSource
base class because it’s hazardous for subclasses likeHashingSource
. (We’re fine forHashingSource
ourselves, but unsafe to introduce because it may introduce similar problems for user subclasses.)Buffer
. In theory we could support reading the position but not writing it; this would be the total number of bytes ever written to the buffer, independent of how many bytes have been consumed. But I think this is a bad idea; two empty buffers could have different read positions and this has weird consequences for equality.Design Options
RandomAccess
as a mixin interfaceWe could improve the API so that
position
support is included in the type system. For example:We would need to split
BufferedSource
intoBufferedSource
andBufferedRandomAccessSource
. AndForwardingSource
intoForwardingSource
andForwardingRandomAccessSource
.FileSystem.source()
cannot return aRandomAccessSource
because implementations likeZipFileSystem
cannot efficiently implement random access on their contents. In fact, different files within a single file system may have different support; presumably seeking on a UNIX pipe is not possible.I expect if we did this, users would still do runtime checks; they’d just use
if (source is RandomAccessSource)
rather thanif (source.position != -1L)
.Rename
position
toreadPosition
We keep the door open to later support independent
readPosition
andwritePosition
properties onBuffer
. (We may find a design to overcome the position + equality thing on buffers.)Modes
Rather than using a sentinel value,
-1L
to indicate that position is unsupported, we could make the sources self-describe that they support. Many random access modes are possible:skip()
.Size
We could add another field,
size
with the total number of bytes in the source. This could also be exposed in APIs like OkHttp’s response body. Returning a size of-1L
indicates that the size is unknown.The text was updated successfully, but these errors were encountered: