-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(io): implement missing IO runtime primitives #264
Conversation
|
||
/** | ||
* Represents an abstract stream of bytes | ||
* Represents an abstract read-only stream of bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that might be interesting on the comments or docs is - is this a "single read" stream or can I seek up and down / read it multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually depends on the variant. If it's just a ByteArray
then of course it's "seekable". If it's a Reader
(aka SdkByteReadChannel
) then it is not seekable.
/** | ||
* Create a [ByteStream] from a file | ||
*/ | ||
fun ByteStream.Companion.fromFile(file: File): ByteStream = file.asByteStream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extension function on the Companion object is that even a thing? Why have this and the File
based extension function below? Is this a discoverability thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extension function on the Companion object is that even a thing?
is that a real question or were you just surprised?
Why have this and the File based extension function below? Is this a discoverability thing?
Consistency. We provide Bytestream.fromXYZ()
functions for a few other types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k
/** | ||
* Write the contents of this ByteStream to file and close it | ||
*/ | ||
suspend fun ByteStream.toFile(file: File): Long { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nit: might want to name this writeToFile
to make it more obvious this is going to consume the stream - rather than just convert it to some File
object.
import io.ktor.utils.io.core.* | ||
|
||
@OptIn(ExperimentalIoApi::class) | ||
internal interface Allocator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wtf is all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this has to do with ktor-io
's Memory
type that abstracts away dealing with various platform "memory". You can't instantiate the type directly in common
it has to be done per/platform.
This followed in ktor-io
's footsteps a bit. On JVM and JS there is no free()
function but on native
there is. I suspect this will go away though since they are changing the memory model.
/** | ||
* Writes all [src] bytes and suspends until all bytes written. | ||
*/ | ||
suspend fun writeFully(src: ByteArray, offset: Int = 0, length: Int = src.size - offset): Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg - how awesome are default parameters? previously this would have been 3 different methods :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right?
@Test | ||
fun testWriteFullyInsufficientSpace() { | ||
val buf = SdkBuffer(16) | ||
val contents = "is it morning or is it night, the software engineer doesn't know anymore" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dark
|
||
private val AddSuppressedMethod: Method? by lazy { | ||
try { | ||
Throwable::class.java.getMethod("addSuppressed", Throwable::class.java) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this done by reflection because we don't support JDK8 universally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is literally copied from ktor-io
which is copied into like 3 other kotlin repositories (including kotlinx-io
) and I don't understand why it isn't in the stdlib (there has been a ticket opened for a while that I linked to).
So to answer your question it's not clear to me why it's done this way but I chose to not modify it. This all seems related to if you hit a second exception while handling the first. Alternatively we could just ignore the secondary exception and only propagate the original.
AddSuppressedMethod?.invoke(this, other) | ||
} | ||
|
||
private val AddSuppressedMethod: Method? by lazy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why is this capitalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see any glaring bugs or anything. Left some thoughts for improvements inline.
/** | ||
* Write the contents of this ByteStream to file and close it | ||
*/ | ||
suspend fun ByteStream.toFile(file: File): Long { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return value should probably be documented which I assume is the number of byte written
*/ | ||
fun rewind(count: Int = readPosition) { | ||
val size = minOf(count, readPosition) | ||
if (size <= 0) return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might consider adding internal APIs to modify the readHead/writeHead that take the beta unsigned int types to ensure that you never accidentally increase when you mean to decrease
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort of considered this. I really wish they would stabilize unsigned types, it's weird not having them in your arsenal. I'll think about it some more and maybe play with it.
* If the total bytes available is less than [length] then as many bytes as are available will be read. | ||
* The total bytes read is returned or `-1` if no data is available. | ||
*/ | ||
fun SdkBuffer.readAvailable(dest: ByteArray, offset: Int = 0, length: Int = dest.size - offset): Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Other methods that return the number of bytes written return Long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so this is sort of deliberate and also something that's bugs me about ByteArray
. Basically a ByteArray
can only be of size Int
bytes. There are no constructors taking a Long
. This is why there is a bit of a difference in the API in various spots. I'm not sure it makes sense to return Long
if it's impossible to ever read/write more than Int
bytes.
* Read from this buffer exactly [length] bytes and write to [dest] starting at [offset] | ||
* @throws IllegalArgumentException if there are not enough bytes available for read or the offset/length combination is invalid | ||
*/ | ||
fun SdkBuffer.readFully(dest: ByteArray, offset: Int = 0, length: Int = dest.size - offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/docs: even after reading some code, I think I know, but I'm not totally sure if offset
pertains to dest
or this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try and update the documentation but it applies to dest
. It will write length
bytes to dest
starting at offset
* Write [length] bytes of [src] to this buffer starting at [offset] | ||
* @throws IllegalArgumentException if there is insufficient space or the offset/length combination is invalid | ||
*/ | ||
fun SdkBuffer.writeFully(src: ByteArray, offset: Int = 0, length: Int = src.size - offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/naming: I personally prefer readToEnd
and writeAll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like writeAll
but I'm not sure on readToEnd
. readToEnd
implies to me that you'll be reading to the end of the buffer but what readFully
does is try and populate the destination buffer "fully" (i.e. exactly of size length
bytes). Maybe readExact
would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although if we go that route maybe writeExact
for symmetry....
/** | ||
* Read the available (unread) contents as a UTF-8 string | ||
*/ | ||
fun SdkBuffer.decodeToString() = bytes().decodeToString(0, readRemaining) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this accept an encoding argument that defaults to UTF-8
but admits other options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably but we don't have a great way to deal with different charsets in multiplatform. ByteArray.decodeToString()
is UTF-8
only. The JVM side of things yes we could probably do that but I think we'll have to punt for now. We can always go back and add it in as a default argument.
* This is a buffered **single-reader single writer channel**. | ||
* | ||
* Read operations can be invoked concurrently with write operations, but multiple reads or multiple writes | ||
* cannot be invoked concurrently with themselves. Exceptions are [close] and [flush] which can be invoked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like maybe a not-so-easy to maintain invariant? To me, this seems much more like a ring buffer than a channel (which to me implies at least MPSC if not MPMC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These invariants come from ktor's implementation. I'd be happy to just describe it as a SPSC channel though and leave out the rest of this description. I can't really think of a scenario at the moment where I need multiple writers or readers and if I did there are other options that could be built on top of this type to handle that.
If you dive into the implementation your intuition isn't too far off though. It's not really a channel per say in the MPSC/MPMC sense. It's a buffer that coordinates suspension. Think of it as a buffer with a condition variable. Read requests suspend only if the read operation requested can't be fulfilled and get notified when it can. Writers suspend if the buffer is full and get notified when more room is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I read this documentation comment just now again and the description makes sense to me.
It's just saying that you can have a single reader and a single writer invoking read/write operations at the same time but that you can't have multiple readers or multiple writers invoking read/write operations at same time (which is just a long winded way of saying SPSC).
What would improve it for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this exposed to customers or is this just internal? I'm mostly just worried about a customer causing a race in their own code because of the thread safety implied by the name "channel"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the SdkByteReadChannel
type is exposed to customers (through the ByteStream
type).
Although I expect most customers to do one of two things (1) write it to file or (2) read it completely into memory (both of which we have provided convenience functions for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mainly looking to see if we must api
coroutines, otherwise some questions and nits.
*/ | ||
fun Path.asByteStream(): ByteStream { | ||
val f = toFile() | ||
require(f.isFile) { "cannot create a ByteStream from a directory: $this" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
If we're guarding dir/file why not also guard if the file exists? File.exists()
I think.
* Write the contents of this ByteStream to file and close it | ||
*/ | ||
suspend fun ByteStream.toFile(file: File): Long { | ||
require(file.isFile) { "cannot write contents of ByteStream to a directory: ${file.absolutePath}" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case the file doesn't need to exist since we may be creating it
@@ -0,0 +1,57 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question
In my experience InputStream
/OutputStream
is the most common way of dealing w/ I/O in Java. They are more flexibil than working directly with filetypes as they can be composed, among other features. Is there a reason you don't provide mappings to those in this file?
I realize we don't use these types in our SDK due to concurrency constraints but seems like at this level those concerns are not valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize we don't use these types in our SDK due to concurrency constraints but seems like at this level those concerns are not valid.
I'm not sure I follow what you mean, can you clarify?
I'll double check but I don't recall seeing any utilities for dealing with Input/Output
stream. The reason we can for files is asynchronous utilities have been built on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, this logic is for customers to work with data in or out of the SDK. As such we should consider thier general utility. To rephrase, in my experience in Javaland, java libraries often use Input/Output streams for handling this data. It's likely that customers will want to use JavaLibraryX
which has a function doSomethingWith(InputStream: data)
. Unless there is a simple way of performing this mapping that I've missed, it may be of limited utility.
It's not something that needs to be addressed as part of this PR, and customer feedback is certainly warranted here.
/** | ||
* ByteStream backed by a local [file] | ||
*/ | ||
public class LocalFileContent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/style
IMO the nature of type java.io.File
is sufficient to express the 'locality' of the behaviour of this class, and think 'Local' could be dropped. Maybe something like FileByteStreamReader
or FileReader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop the Local
but to be consistent the rest of the types are all XyzContent
so it should probably be FileContent
to be consistent
client-runtime/io/build.gradle.kts
Outdated
implementation("io.ktor:ktor-io:$ktorVersion") | ||
|
||
// Dispatchers.IO | ||
api("org.jetbrains.kotlinx:kotlinx-coroutines-core:$coroutinesVersion") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question
As I understand it this provides access to the jetbrains coroutine library from within our SDK for customers to depend on from their programs. It would be safer perhaps to require customers to depend on that directly. Is there a reason why this cannot be implementation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to make this implementation
however I think it has to be api
. The comment above indicates why, the file readers expose Dispatchers.IO
which is from kotlinx-coroutines-core
.
You are correct that it technically allows others to access the jetbrains library from our SDK but that's not the real purpose of api
:
The api configuration should be used to declare dependencies which are exported by the library API, whereas the implementation configuration should be used to declare dependencies which are internal to the component.
https://docs.gradle.org/current/userguide/java_library_plugin.html#sec:java_library_separation
So because we use a type in our API from the library it's an api
dependency to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, appreciate the details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to make this implementation
for now by just hard coding the dispatcher to Dispatchers.IO
. We can revisit as needed.
if (limit == 0L) return 0L | ||
|
||
// delegate to ktor-io if possible which may have further optimizations based on impl | ||
val cnt = if (this is IsKtorReadChannel && dst is IsKtorWriteChannel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/style
I prefer count
to cnt
import software.aws.clientrt.testing.runSuspendTest | ||
import kotlin.test.* | ||
|
||
class SdkByteChannelOpsTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question
what's ops
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operations
|
||
package software.aws.clientrt.io | ||
|
||
internal fun SdkBuffer.hasArray() = memory.buffer.hasArray() && !memory.buffer.isReadOnly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -609,6 +609,8 @@ abstract class HttpBindingProtocolGenerator : ProtocolGenerator { | |||
.map { it.member } | |||
|
|||
if (documentMembers.isNotEmpty()) { | |||
// FIXME - we should not be slurping the entire contents into memory, instead our deserializers | |||
// should work off of an SdkByteReadChannel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yesss
@@ -42,4 +42,30 @@ class MiddlewareTest { | |||
} | |||
assertEquals("Foo", handler.call("foo")) | |||
} | |||
|
|||
@Test | |||
fun testMapRequest() = runSuspendTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question
It's not clear to me what these tests are exercising. can you briefly explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are covering the utility middleware components MapRequest/MapResponse
. I noticed our code coverage missed these and they were easy to add so I added them.
Yeah I totally get this concern. I have the same concern / intuition but I'm not sure if we can do better unless we hard code the dispatcher to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits here and there, nothing to block for.
@@ -0,0 +1,57 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, this logic is for customers to work with data in or out of the SDK. As such we should consider thier general utility. To rephrase, in my experience in Javaland, java libraries often use Input/Output streams for handling this data. It's likely that customers will want to use JavaLibraryX
which has a function doSomethingWith(InputStream: data)
. Unless there is a simple way of performing this mapping that I've missed, it may be of limited utility.
It's not something that needs to be addressed as part of this PR, and customer feedback is certainly warranted here.
client-runtime/io/build.gradle.kts
Outdated
implementation("io.ktor:ktor-io:$ktorVersion") | ||
|
||
// Dispatchers.IO | ||
api("org.jetbrains.kotlinx:kotlinx-coroutines-core:$coroutinesVersion") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, appreciate the details
* Wrap ktor's ByteReadChannel as our own. This implements the common API of [SdkByteReadChannel]. Only | ||
* platform specific differences in interfaces need be implemented in inheritors. | ||
*/ | ||
internal abstract class KtorReadChannelAdapterBase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it was just a nit. If the long form provides more safety, that is better.
Issue #, if available:
closes #130
Description of changes:
Source
->SdkByteReadChannel
ktor-io
implementation of read/write channels.ByteRead/ByteWrite
channels. ktor has lots of extension methods for doing all sorts of things. We can add as needed. Our primary use case is reading/writing to/from sockets/files (usually in larger chunks)File
/Path
on JVM to read/write to/from files as a channel (ktor does the heavy lifting here of course)LocalFileContent
orByteStream.toFile(...)
LocalFileContent
(CRT has implementations of reading a file already and supplying it as HTTP body/signing, etc).SdkBuffer
type that we can use internally. Similar toByteBuffer
but grows as needed.ktor-io
,kotlinx-io
, andokio
implementations. Bothktor-io
andokio
have a buffer abstraction that allows writing arbitrary amount of bytes to it without knowing the size up front. They both pool buffers internally and release them as they are read. (okio
differs in that it'sBuffer
can read/write whereasktor-io
provides separate abstractions (BytePacketBuilder
/ByteReadPacket
)). This is neat and makes total sense but we need the ability to rewind and read content multiple times for signing. It also complicates the lifetimes of those types as you have to be sure to release the contents back to the pool (usually done internally by requiring a call toclose()
).ktor-io
'sBuffer
type but grows on demand and can be instantiated explicitly.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.