-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UpdateBy gRPC #2635
UpdateBy gRPC #2635
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎 as it stands - this is introducing a lot (all? not sure) of QST via gRPC in this one new feature, and that means there is a lot of code that isn't obvious from the pull request which is now reachable from any client. Easiest example: a Expression.type.raw is an easy way to smuggle arbitrary java for execution, but there are no safeguards introduced to prevent this (the updateby group_by field may hold those raw expression strings, and engine-table's SelectColumn.ExpressionAdapter will invoke SelectColumnFactory.getExpression(..)
on them without a second look).
At the very least, if this feature can only be implemented by using QST's Expression types, I would a) like to see the qst protos moved to a new .proto file, and b) a safe-by-default config option preventing the use of the UpdateByGrpcImpl type.
proto/proto-backplane-grpc/src/main/proto/deephaven/proto/table.proto
Outdated
Show resolved
Hide resolved
|
||
message UpdateByRequest { | ||
|
||
message Options { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC not all languages qualify these with namespacing, it probably would be a good idea to give more descriptive names to avoid collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate :/ but good to know.
Expression expression = 2; | ||
} | ||
|
||
message Pair { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convention elsewhere has been to just encode these as foo=bar
/foo
strings - ComboAggregateRequest.Aggregate.match_pairs for example.
i see the point that we can be more descriptive by describing it this way, but for a=a
we have to write the string twice anyways, and a plain string for each of the two fields is also not accurate, since it can only be java identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some documentation - but the idea is that input_column_name
can be empty for the a=a
case:
public static Pair adapt(io.deephaven.proto.backplane.grpc.Pair pair) {
final ColumnName output = ColumnName.of(pair.getOutputColumnName());
return pair.getInputColumnName().isEmpty() ? output : Pair.of(ColumnName.of(pair.getInputColumnName()), output);
}
|
||
optional int32 chunk_capacity = 2; | ||
|
||
optional double maxStaticSparseMemoryOverhead = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snake case for message fields (maximumLoadFactor, targetLoadFactor too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
Spec spec = 1; | ||
repeated Pair pair = 2; | ||
} | ||
oneof type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a oneof?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's explicitly modeled this way at the table api level b/c there is room for additional types in the future:
public interface UpdateByClause {
...
interface Visitor<T> {
T visit(ColumnUpdateClause clause);
}
proto/proto-backplane-grpc/src/main/proto/deephaven/proto/table.proto
Outdated
Show resolved
Hide resolved
|
||
enum BadDataBehavior { | ||
// Reset the state for the bucket to {@code null} when invalid data is encountered. | ||
RESET = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 0 is the "This is not set" default, consider making it fail-safe with THROW, so as to not surprise users who fail to set this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In context, at least right now, this is always being prefaced w/ optional
. The idea is that if the client does not provide a value for it, it will inherit the appropriate server defaults (the defaults vary depending on the specific field):
message UpdateByEmaOptions {
optional BadDataBehavior on_null_value = 1;
optional BadDataBehavior on_nan_value = 2;
...
I'll add documentation at this layer to explain that.
That said, I'm not against making THROW = 0
, but there may be reasons to want the order to be the same as the java enum it is mapping? (We can update the java enum as well...)
oneof type { | ||
string column_name = 1; | ||
int64 long_value = 2; | ||
string raw = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only raw and long_value? if various other numbers/etc can be raw, why can't long?
Also it seems odd to refer to the "type" as being "column_name" or "raw" - perhaps expression could have a "value" which could be "column reference" or "long literal", "raw X value", etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just found the Expression type hierarchy in existing code, and I'm only more confused - why is RawString not a Value, but ColumnName is? Shouldn't ColumnName be a "Reference" or something rather than a Value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column name, long value, and raw are the only plumbed through parts at the table api layer right now, but I agree we should flesh these out more (#830). I can complete the loop a bit here in these regards.
As it is right now,
Expression = Value | RawString
Value = ColumnName | long (literal)
Maybe we should add more hierarchy here, with "Reference" and "Literal" instead of "Value"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this proto.
server/src/main/java/io/deephaven/server/table/ops/UpdateByGrpcImpl.java
Outdated
Show resolved
Hide resolved
At first I was a bit confused by your comment on "easy way to smuggle arbitrary java for execution" - is there anything preventing the equivalent via a view or select? Looking at the code now though, I see I'll adapt these checks in. I'm hoping that we can (eventually) migrate from |
Looking deeper, I'm having trouble finding out if we disable arbitrary code execution via gRPC table operations api today. Afaict, we only do "this looks like a valid operation" validation. I would also argue, this should be the responsibility of the engine, not the grpc layer. Trying to get clarification from @rcaudy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. It's a lot of code for not a lot of functionality.
java-client/flight-dagger/src/test/java/io/deephaven/client/DeephavenFlightSessionTest.java
Outdated
Show resolved
Hide resolved
proto/proto-backplane-grpc/src/main/proto/deephaven/proto/table.proto
Outdated
Show resolved
Hide resolved
|
||
private static io.deephaven.proto.backplane.grpc.Pair adapt(ColumnName pair) { | ||
return io.deephaven.proto.backplane.grpc.Pair.newBuilder() | ||
.setOutputColumnName(pair.name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It surprised me that you were opting to not make input column name explicit.
server/src/main/java/io/deephaven/server/table/ops/GrpcQstTableOperation.java
Outdated
Show resolved
Hide resolved
e6c87d6
to
8c4454a
Compare
server/src/main/java/io/deephaven/server/table/ops/GrpcQstTableOperation.java
Outdated
Show resolved
Hide resolved
I don't hate this. I do sort of feel like the layering is getting excessive. I think the evidence of this can be inferred from how long it took you (@devinrsmith ) and I to trace through some of the table creation logic for parent resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marker, I've reviewed up to here.
cb8846b
to
7fbfe98
Compare
I've force pushed removing any changes to TableSpec and gRPC impl via TableSpec. I haven't resolved any concerns wrt the .proto structure yet. |
// public TableSpec apply(TableSpec spec, String[] formulas) { | ||
// return spec.selectDistinct(formulas); | ||
// } | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not meant to be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove it, and looks like I need to also fix some merge conflicts.
Fixes #2607