feat(io): implement missing IO runtime primitives #264

aajtodd · 2021-03-31T15:22:43Z

Issue #, if available:
closes #130

Description of changes:

(refactor): rename Source -> SdkByteReadChannel
(feat): add wrappers around ktor-io implementation of read/write channels.
- NOTE: we are marking the creation of these types as internal but the interfaces are public. This allow us to make use of them in the runtime but customers can only be given an instance of one (we aren't trying to implement a general purpose IO library here for others to use).
- We are only exposing a very minimal subset of ktor's equivalent ByteRead/ByteWrite channels. ktor has lots of extension methods for doing all sorts of things. We can add as needed. Our primary use case is reading/writing to/from sockets/files (usually in larger chunks)
(feat): Added extensions for File / Path on JVM to read/write to/from files as a channel (ktor does the heavy lifting here of course)
- Customers will likely only ever interact with supplying a file as LocalFileContent or ByteStream.toFile(...)
- We will also likely not use the channel when interacting with CRT and instead special case LocalFileContent (CRT has implementations of reading a file already and supplying it as HTTP body/signing, etc).
(feat): Added an SdkBuffer type that we can use internally. Similar to ByteBuffer but grows as needed.
- I struggled to figure out what I wanted from this type and ultimately landed with this. I looked heavily over ktor-io, kotlinx-io, and okio implementations. Both ktor-io and okio have a buffer abstraction that allows writing arbitrary amount of bytes to it without knowing the size up front. They both pool buffers internally and release them as they are read. (okio differs in that it's Buffer can read/write whereas ktor-io provides separate abstractions (BytePacketBuilder/ByteReadPacket)). This is neat and makes total sense but we need the ability to rewind and read content multiple times for signing. It also complicates the lifetimes of those types as you have to be sure to release the contents back to the pool (usually done internally by requiring a call to close()).
- In the end I decided to start with something simpler albeit maybe less efficient (depending on number of re-allocations). I expect we'll use this type for implementing the protocol serializers (awsQuery first, later json and xml when we implement our own). This ended up being similar to ktor-io's Buffer type but grows on demand and can be instantiated explicitly.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

kiiadi · 2021-03-31T15:36:55Z

client-runtime/client-rt-core/common/src/software/aws/clientrt/content/ByteStream.kt


 /**
- * Represents an abstract stream of bytes
+ * Represents an abstract read-only stream of bytes


One thing that might be interesting on the comments or docs is - is this a "single read" stream or can I seek up and down / read it multiple times?

That actually depends on the variant. If it's just a ByteArray then of course it's "seekable". If it's a Reader (aka SdkByteReadChannel) then it is not seekable.

kiiadi · 2021-03-31T15:41:00Z

client-runtime/client-rt-core/jvm/src/software/aws/clientrt/content/ByteStreamJVM.kt

+/**
+ * Create a [ByteStream] from a file
+ */
+fun ByteStream.Companion.fromFile(file: File): ByteStream = file.asByteStream()


extension function on the Companion object is that even a thing? Why have this and the File based extension function below? Is this a discoverability thing?

extension function on the Companion object is that even a thing?

is that a real question or were you just surprised?

Why have this and the File based extension function below? Is this a discoverability thing?

Consistency. We provide Bytestream.fromXYZ() functions for a few other types.

kiiadi · 2021-03-31T15:46:48Z

client-runtime/client-rt-core/jvm/src/software/aws/clientrt/content/ByteStreamJVM.kt

+/**
+ * Write the contents of this ByteStream to file and close it
+ */
+suspend fun ByteStream.toFile(file: File): Long {


naming nit: might want to name this writeToFile to make it more obvious this is going to consume the stream - rather than just convert it to some File object.

kiiadi · 2021-03-31T15:48:31Z

client-runtime/io/common/src/software/aws/clientrt/io/Allocator.kt

+import io.ktor.utils.io.core.*
+
+@OptIn(ExperimentalIoApi::class)
+internal interface Allocator {


wtf is all this?

so this has to do with ktor-io's Memory type that abstracts away dealing with various platform "memory". You can't instantiate the type directly in common it has to be done per/platform.

This followed in ktor-io's footsteps a bit. On JVM and JS there is no free() function but on native there is. I suspect this will go away though since they are changing the memory model.

kiiadi · 2021-03-31T16:01:31Z

client-runtime/io/common/src/software/aws/clientrt/io/SdkByteWriteChannel.kt

+    /**
+     * Writes all [src] bytes and suspends until all bytes written.
+     */
+    suspend fun writeFully(src: ByteArray, offset: Int = 0, length: Int = src.size - offset): Unit


omg - how awesome are default parameters? previously this would have been 3 different methods :)

kiiadi · 2021-03-31T16:02:14Z

client-runtime/io/common/test/software/aws/clientrt/io/SdkBufferTest.kt

+    @Test
+    fun testWriteFullyInsufficientSpace() {
+        val buf = SdkBuffer(16)
+        val contents = "is it morning or is it night, the software engineer doesn't know anymore"


kiiadi · 2021-03-31T16:03:38Z

client-runtime/io/jvm/src/software/aws/clientrt/io/CloseableJVM.kt

+
+private val AddSuppressedMethod: Method? by lazy {
+    try {
+        Throwable::class.java.getMethod("addSuppressed", Throwable::class.java)


Is this done by reflection because we don't support JDK8 universally?

this is literally copied from ktor-io which is copied into like 3 other kotlin repositories (including kotlinx-io) and I don't understand why it isn't in the stdlib (there has been a ticket opened for a while that I linked to).

So to answer your question it's not clear to me why it's done this way but I chose to not modify it. This all seems related to if you hit a second exception while handling the first. Alternatively we could just ignore the secondary exception and only propagate the original.

kiiadi · 2021-03-31T16:04:10Z

client-runtime/io/jvm/src/software/aws/clientrt/io/CloseableJVM.kt

+    AddSuppressedMethod?.invoke(this, other)
+}
+
+private val AddSuppressedMethod: Method? by lazy {


nit: why is this capitalized?

rcoh

Didn't see any glaring bugs or anything. Left some thoughts for improvements inline.

rcoh · 2021-03-31T17:13:00Z

client-runtime/client-rt-core/jvm/src/software/aws/clientrt/content/ByteStreamJVM.kt

+/**
+ * Write the contents of this ByteStream to file and close it
+ */
+suspend fun ByteStream.toFile(file: File): Long {


nit: return value should probably be documented which I assume is the number of byte written

rcoh · 2021-03-31T17:22:26Z

client-runtime/io/common/src/software/aws/clientrt/io/SdkBuffer.kt

+     */
+    fun rewind(count: Int = readPosition) {
+        val size = minOf(count, readPosition)
+        if (size <= 0) return


I might consider adding internal APIs to modify the readHead/writeHead that take the beta unsigned int types to ensure that you never accidentally increase when you mean to decrease

I sort of considered this. I really wish they would stabilize unsigned types, it's weird not having them in your arsenal. I'll think about it some more and maybe play with it.

rcoh · 2021-03-31T17:22:57Z

client-runtime/io/common/src/software/aws/clientrt/io/SdkBuffer.kt

+ * If the total bytes available is less than [length] then as many bytes as are available will be read.
+ * The total bytes read is returned or `-1` if no data is available.
+ */
+fun SdkBuffer.readAvailable(dest: ByteArray, offset: Int = 0, length: Int = dest.size - offset): Int {


nit: Other methods that return the number of bytes written return Long

Yeah so this is sort of deliberate and also something that's bugs me about ByteArray. Basically a ByteArray can only be of size Int bytes. There are no constructors taking a Long. This is why there is a bit of a difference in the API in various spots. I'm not sure it makes sense to return Long if it's impossible to ever read/write more than Int bytes.

rcoh · 2021-03-31T17:25:37Z

client-runtime/io/common/src/software/aws/clientrt/io/SdkBuffer.kt

+ * Read from this buffer exactly [length] bytes and write to [dest] starting at [offset]
+ * @throws IllegalArgumentException if there are not enough bytes available for read or the offset/length combination is invalid
+ */
+fun SdkBuffer.readFully(dest: ByteArray, offset: Int = 0, length: Int = dest.size - offset) {


nit/docs: even after reading some code, I think I know, but I'm not totally sure if offset pertains to dest or this

I'll try and update the documentation but it applies to dest. It will write length bytes to dest starting at offset

rcoh · 2021-03-31T17:27:36Z

client-runtime/io/common/src/software/aws/clientrt/io/SdkBuffer.kt

+ * Write [length] bytes of [src] to this buffer starting at [offset]
+ * @throws IllegalArgumentException if there is insufficient space or the offset/length combination is invalid
+ */
+fun SdkBuffer.writeFully(src: ByteArray, offset: Int = 0, length: Int = src.size - offset) {


nit/naming: I personally prefer readToEnd and writeAll

I like writeAll but I'm not sure on readToEnd. readToEnd implies to me that you'll be reading to the end of the buffer but what readFully does is try and populate the destination buffer "fully" (i.e. exactly of size length bytes). Maybe readExact would be better?

Although if we go that route maybe writeExact for symmetry....

rcoh · 2021-03-31T17:28:07Z

client-runtime/io/common/src/software/aws/clientrt/io/SdkBuffer.kt

+/**
+ * Read the available (unread) contents as a UTF-8 string
+ */
+fun SdkBuffer.decodeToString() = bytes().decodeToString(0, readRemaining)


should this accept an encoding argument that defaults to UTF-8 but admits other options?

Probably but we don't have a great way to deal with different charsets in multiplatform. ByteArray.decodeToString() is UTF-8 only. The JVM side of things yes we could probably do that but I think we'll have to punt for now. We can always go back and add it in as a default argument.

rcoh · 2021-03-31T17:30:07Z

client-runtime/io/common/src/software/aws/clientrt/io/SdkByteChannel.kt

+ * This is a buffered **single-reader single writer channel**.
+ *
+ * Read operations can be invoked concurrently with write operations, but multiple reads or multiple writes
+ * cannot be invoked concurrently with themselves. Exceptions are [close] and [flush] which can be invoked


this seems like maybe a not-so-easy to maintain invariant? To me, this seems much more like a ring buffer than a channel (which to me implies at least MPSC if not MPMC)

These invariants come from ktor's implementation. I'd be happy to just describe it as a SPSC channel though and leave out the rest of this description. I can't really think of a scenario at the moment where I need multiple writers or readers and if I did there are other options that could be built on top of this type to handle that.

If you dive into the implementation your intuition isn't too far off though. It's not really a channel per say in the MPSC/MPMC sense. It's a buffer that coordinates suspension. Think of it as a buffer with a condition variable. Read requests suspend only if the read operation requested can't be fulfilled and get notified when it can. Writers suspend if the buffer is full and get notified when more room is available.

Actually I read this documentation comment just now again and the description makes sense to me.

It's just saying that you can have a single reader and a single writer invoking read/write operations at the same time but that you can't have multiple readers or multiple writers invoking read/write operations at same time (which is just a long winded way of saying SPSC).

What would improve it for you?

Is this exposed to customers or is this just internal? I'm mostly just worried about a customer causing a race in their own code because of the thread safety implied by the name "channel"

Only the SdkByteReadChannel type is exposed to customers (through the ByteStream type).

Although I expect most customers to do one of two things (1) write it to file or (2) read it completely into memory (both of which we have provided convenience functions for).

kggilmer

mainly looking to see if we must api coroutines, otherwise some questions and nits.

kggilmer · 2021-04-01T20:37:46Z

client-runtime/client-rt-core/jvm/src/software/aws/clientrt/content/ByteStreamJVM.kt

+ */
+fun Path.asByteStream(): ByteStream {
+    val f = toFile()
+    require(f.isFile) { "cannot create a ByteStream from a directory: $this" }


suggestion

If we're guarding dir/file why not also guard if the file exists? File.exists() I think.

kggilmer · 2021-04-01T20:38:24Z

client-runtime/client-rt-core/jvm/src/software/aws/clientrt/content/ByteStreamJVM.kt

+ * Write the contents of this ByteStream to file and close it
+ */
+suspend fun ByteStream.toFile(file: File): Long {
+    require(file.isFile) { "cannot write contents of ByteStream to a directory: ${file.absolutePath}" }


same as above

in this case the file doesn't need to exist since we may be creating it

kggilmer · 2021-04-01T20:44:37Z

client-runtime/client-rt-core/jvm/src/software/aws/clientrt/content/ByteStreamJVM.kt

@@ -0,0 +1,57 @@
+/*


question

In my experience InputStream/OutputStream is the most common way of dealing w/ I/O in Java. They are more flexibil than working directly with filetypes as they can be composed, among other features. Is there a reason you don't provide mappings to those in this file?

I realize we don't use these types in our SDK due to concurrency constraints but seems like at this level those concerns are not valid.

I realize we don't use these types in our SDK due to concurrency constraints but seems like at this level those concerns are not valid.

I'm not sure I follow what you mean, can you clarify?

I'll double check but I don't recall seeing any utilities for dealing with Input/Output stream. The reason we can for files is asynchronous utilities have been built on top of it.

I mean, this logic is for customers to work with data in or out of the SDK. As such we should consider thier general utility. To rephrase, in my experience in Javaland, java libraries often use Input/Output streams for handling this data. It's likely that customers will want to use JavaLibraryX which has a function doSomethingWith(InputStream: data). Unless there is a simple way of performing this mapping that I've missed, it may be of limited utility.

It's not something that needs to be addressed as part of this PR, and customer feedback is certainly warranted here.

kggilmer · 2021-04-01T20:50:08Z

client-runtime/client-rt-core/jvm/src/software/aws/clientrt/content/LocalFileContent.kt

+/**
+ * ByteStream backed by a local [file]
+ */
+public class LocalFileContent(


nit/style

IMO the nature of type java.io.File is sufficient to express the 'locality' of the behaviour of this class, and think 'Local' could be dropped. Maybe something like FileByteStreamReader or FileReader?

We can drop the Local but to be consistent the rest of the types are all XyzContent so it should probably be FileContent to be consistent

kggilmer · 2021-04-01T20:52:39Z

client-runtime/io/build.gradle.kts

+                implementation("io.ktor:ktor-io:$ktorVersion")
+
+                // Dispatchers.IO
+                api("org.jetbrains.kotlinx:kotlinx-coroutines-core:$coroutinesVersion")


question

As I understand it this provides access to the jetbrains coroutine library from within our SDK for customers to depend on from their programs. It would be safer perhaps to require customers to depend on that directly. Is there a reason why this cannot be implementation?

I would love to make this implementation however I think it has to be api. The comment above indicates why, the file readers expose Dispatchers.IO which is from kotlinx-coroutines-core.

You are correct that it technically allows others to access the jetbrains library from our SDK but that's not the real purpose of api:

The api configuration should be used to declare dependencies which are exported by the library API, whereas the implementation configuration should be used to declare dependencies which are internal to the component.

https://docs.gradle.org/current/userguide/java_library_plugin.html#sec:java_library_separation

So because we use a type in our API from the library it's an api dependency to us.

Thanks, appreciate the details

I was able to make this implementation for now by just hard coding the dispatcher to Dispatchers.IO. We can revisit as needed.

kggilmer · 2021-04-01T21:50:41Z

client-runtime/io/common/src/software/aws/clientrt/io/SdkByteReadChannel.kt

+    if (limit == 0L) return 0L
+
+    // delegate to ktor-io if possible which may have further optimizations based on impl
+    val cnt = if (this is IsKtorReadChannel && dst is IsKtorWriteChannel) {


nit/style

I prefer count to cnt

kggilmer · 2021-04-01T21:57:32Z

client-runtime/io/common/test/software/aws/clientrt/io/SdkByteChannelOpsTest.kt

+import software.aws.clientrt.testing.runSuspendTest
+import kotlin.test.*
+
+class SdkByteChannelOpsTest {


question

what's ops?

kggilmer · 2021-04-01T22:00:14Z

client-runtime/io/jvm/src/software/aws/clientrt/io/SdkBufferJVM.kt

+
+package software.aws.clientrt.io
+
+internal fun SdkBuffer.hasArray() = memory.buffer.hasArray() && !memory.buffer.isReadOnly


kggilmer · 2021-04-01T22:01:21Z

...ain/kotlin/software/amazon/smithy/kotlin/codegen/integration/HttpBindingProtocolGenerator.kt

@@ -609,6 +609,8 @@ abstract class HttpBindingProtocolGenerator : ProtocolGenerator {
                        .map { it.member }

                    if (documentMembers.isNotEmpty()) {
+                        // FIXME - we should not be slurping the entire contents into memory, instead our deserializers
+                        // should work off of an SdkByteReadChannel


kggilmer · 2021-04-01T22:04:59Z

client-runtime/io/common/test/software/aws/clientrt/io/middleware/MiddlewareTest.kt

@@ -42,4 +42,30 @@ class MiddlewareTest {
        }
        assertEquals("Foo", handler.call("foo"))
    }
+
+    @Test
+    fun testMapRequest() = runSuspendTest {


question

It's not clear to me what these tests are exercising. can you briefly explain?

They are covering the utility middleware components MapRequest/MapResponse. I noticed our code coverage missed these and they were easy to add so I added them.

aajtodd · 2021-04-01T23:27:55Z

mainly looking to see if we must api coroutines

Yeah I totally get this concern. I have the same concern / intuition but I'm not sure if we can do better unless we hard code the dispatcher to Dispatcher.IO internally (exposed here). I believe that was the only type exposed but I'd have to double check again at this point.

kggilmer

minor nits here and there, nothing to block for.

kggilmer · 2021-04-03T00:26:49Z

client-runtime/client-rt-core/jvm/src/software/aws/clientrt/content/ByteStreamJVM.kt

@@ -0,0 +1,57 @@
+/*


I mean, this logic is for customers to work with data in or out of the SDK. As such we should consider thier general utility. To rephrase, in my experience in Javaland, java libraries often use Input/Output streams for handling this data. It's likely that customers will want to use JavaLibraryX which has a function doSomethingWith(InputStream: data). Unless there is a simple way of performing this mapping that I've missed, it may be of limited utility.

It's not something that needs to be addressed as part of this PR, and customer feedback is certainly warranted here.

kggilmer · 2021-04-03T00:28:23Z

client-runtime/io/build.gradle.kts

+                implementation("io.ktor:ktor-io:$ktorVersion")
+
+                // Dispatchers.IO
+                api("org.jetbrains.kotlinx:kotlinx-coroutines-core:$coroutinesVersion")


Thanks, appreciate the details

kggilmer · 2021-04-03T00:30:18Z

client-runtime/io/common/src/software/aws/clientrt/io/KtorAdapters.kt

+ * Wrap ktor's ByteReadChannel as our own. This implements the common API of [SdkByteReadChannel]. Only
+ * platform specific differences in interfaces need be implemented in inheritors.
+ */
+internal abstract class KtorReadChannelAdapterBase(


yeah it was just a nit. If the long form provides more safety, that is better.

aajtodd added 16 commits March 24, 2021 09:47

feat: add missing Closeable interface

37cadc4

refactor: rename Source

6264273

fix nullability

6217b1f

feat(io): wrap ktor byte channels

6dd0f09

docs

8b18638

refactor: rename sdk channel types

bd7f020

add JVM extensions for reading and writing bytestream as a file

8a680c1

bootstrap some sanity tests

9ef7b21

add a fallback copy to method

ff0e32b

docs

ebf574a

define buffer size as constant

8fac87c

port over a reasonable buffer abstraction

7bdbc53

implicitly grow buffer as needed

9b477e7

minor cleanup

94f2e13

cleanup potential for overflow

dbc6b51

notes for posterity

cfc2756

aajtodd requested review from kggilmer, kiiadi, rcoh and kneekey23 March 31, 2021 15:23

kiiadi reviewed Mar 31, 2021

View reviewed changes

rcoh approved these changes Mar 31, 2021

View reviewed changes

rename toFile() -> writeToFile()

ead19c7

kggilmer reviewed Apr 1, 2021

View reviewed changes

kggilmer approved these changes Apr 3, 2021

View reviewed changes

aajtodd added 4 commits April 5, 2021 10:08

remove api dependency requirement by hard coding to Dispatchers.IO

79c308b

guard file exists

a515957

refactor: rename LocalFileContent -> FileContent

920840f

refactor: rename ktor marker interface

8784db7

aajtodd added 2 commits April 5, 2021 10:23

fix: bytes() should only return valid bytes

244d075

add state check

d42c183

aajtodd merged commit b083bd9 into main Apr 6, 2021

aajtodd deleted the feat-io branch April 6, 2021 14:53


		package software.aws.clientrt.io

		internal fun SdkBuffer.hasArray() = memory.buffer.hasArray() && !memory.buffer.isReadOnly

feat(io): implement missing IO runtime primitives #264

feat(io): implement missing IO runtime primitives #264

Conversation

aajtodd commented Mar 31, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aajtodd Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcoh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcoh Apr 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kggilmer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aajtodd commented Apr 1, 2021

kggilmer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aajtodd Mar 31, 2021 •

edited

Loading

rcoh Apr 1, 2021 •

edited

Loading