-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Barrage stream reading into Chunks #5692
Conversation
…eading schema at beginning of stream
this.conversion = conversion; | ||
} | ||
|
||
public <T> ChunkReader transform(Function<Byte, T> transform) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to replace these Function<BoxedPrimitive, T>
interfaces with some replicated version to avoid unnecessary boxing of values that can never be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of moving this patch along, I'm going to punt on this, the updated version is no more wrong than it previously was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, we may want the transformers to "transform" null values to give better control to the future-feature of custom formatters. If this were the case then the boxing would be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that these already have the DH null values as primitives, so a null
input is impossible at this time. T
could certainly be null though for an output, depending on what kind of chunk is going to be written to (not controlled by this code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply notice that we do not call the transformer if the value is the null value. As in, a custom transformer can't react to any null values. I see your point that we could use the deephaven null value and that using primitives will make it very clear that null is being represented in a non-boxed way. You're probably right that we should avoid boxing here.
...s/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ChunkInputStreamGenerator.java
Outdated
Show resolved
Hide resolved
extensions/barrage/src/main/java/io/deephaven/extensions/barrage/util/BarrageStreamReader.java
Outdated
Show resolved
Hide resolved
.../barrage/src/main/java/io/deephaven/extensions/barrage/chunk/DefaultChunkReadingFactory.java
Outdated
Show resolved
Hide resolved
extensions/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ChunkReaderFactory.java
Outdated
Show resolved
Hide resolved
...ge/src/main/java/io/deephaven/extensions/barrage/chunk/BooleanChunkInputStreamGenerator.java
Show resolved
Hide resolved
this.conversion = conversion; | ||
} | ||
|
||
public <T> ChunkReader transform(Function<Byte, T> transform) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, we may want the transformers to "transform" null values to give better control to the future-feature of custom formatters. If this were the case then the boxing would be necessary.
extensions/barrage/src/main/java/io/deephaven/extensions/barrage/chunk/ChunkReaderFactory.java
Outdated
Show resolved
Hide resolved
ByteBuffer original = message.getByteBuffer(); | ||
ByteBuffer copy = ByteBuffer.allocate(original.remaining()).put(original).rewind(); | ||
Schema schema = new Schema(); | ||
Message.getRootAsMessage(copy).header(schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a detailed comment as to why we need to copy this. I suspect the reason is that the converted arrow schema references the new buffer? We may want to push the copying into BarrageUtil if that's the case because it's super common to assume that the byte buffer is temporarily immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't quite about immutability, but the fact that the ByteBuffer is owned by python, and if py frees the underlying buffer we'll be reading garbage when trying to handle a later RecordBatch.
ByteBuffer copy = ByteBuffer.allocate(original.remaining()).put(original).rewind(); | ||
Schema schema = new Schema(); | ||
Message.getRootAsMessage(copy).header(schema); | ||
header.header(schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible I would like it more obvious why we need to copy here. (e.g. a comment related to what references get leaked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% that we need to copy in this case - technically it appears no since line 87 does a ByteBuffer.wrap(). Instead this is an attempt to be defensive in case a future impl is reading from a slice/etc of the ByteBuffer that came in over the wire.
Like #5552, applies some after-the-fact review of the design for reading Barrage/Flight stream, in anticipation of sharing this code with JavaScript clients.
Partial #188