-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct Data Type(Part-1): New struct type, DDL statements and Describe #1114
Conversation
…nsupported-Operations-From-The-Grammar
get rid of String encoded values. registered -> boolean replicaInfo -> list<int> instead of string encoding that info partitionCount -> gone since it is encoded in replicaInfo
…rrides (confluentinc#924) This includes one minor fix to the MetricsCollector to make it correctly robust to reuse across tests that are both aware of its initialization/cleanup and those that are not.
* reduce checkstyle suppressions * remove commented out suppression
clusterId metadata was introduced in Kafka 0.10.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first pass through. Looks good. Left some comments inline.
return "MAP[" + getSchemaFieldType(field.schema().keySchema().fields().get(0)) + "," | ||
+ getSchemaFieldType(field.schema().valueSchema().fields().get(0)) + "]"; | ||
} else if (field.schema().type() == Schema.Type.STRUCT) { | ||
StringBuilder stringBuilder = new StringBuilder("STRUCT <"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stringBuilder.append(
field.schema().fields().stream()
.map(getSchemaFieldType)
.collect(Collectors.joining(", ")));
@@ -263,6 +263,10 @@ primaryExpression | |||
| POSITION '(' valueExpression IN valueExpression ')' #position | |||
| '(' expression (',' expression)+ ')' #rowConstructor | |||
| ROW '(' expression (',' expression)* ')' #rowConstructor | |||
| '(' expression (',' expression)+ ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as rowConstructor, which seems bad. Isn't this ambiguous? Kind of surprising antlr doesn't raise an error. I guess its working fine since we're not using row constructors anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need ROW
. Removed it!
@@ -263,6 +263,10 @@ primaryExpression | |||
| POSITION '(' valueExpression IN valueExpression ')' #position | |||
| '(' expression (',' expression)+ ')' #rowConstructor | |||
| ROW '(' expression (',' expression)* ')' #rowConstructor | |||
| '(' expression (',' expression)+ ')' | |||
#structConstructor | |||
| STRUCT '(' expression (',' expression)* ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is here for when we want to create structs on the fly? Is this what the final syntax is going to look like? Maybe leave it out until that's decided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can decide about this later. Removed it.
|
||
public class Map extends Type { | ||
|
||
final Type keyType = new PrimitiveType(KsqlType.STRING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the key always STRING?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key for map is always string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I didn't know avro keys were restricted to strings too. Makes sense. We should raise an error if a user has a ddl statement with MAP<not-STRING,...> in it. Or change the grammar to just take a value parameter.
@@ -362,4 +372,12 @@ protected R visitInsertInto(InsertInto node, C context) { | |||
process(node.getQuery(), context); | |||
return null; | |||
} | |||
|
|||
protected R visitCreateStreamAsSelect(CreateStreamAsSelect node, C context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious - is this used anywhere? the analyzers seem to just operate on Query nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a visitor that can be used by creating subclasses of it. So even though we may not use it now we may have future uses for it.
+ "\t arraycol ARRAY[DOUBLE], \n" | ||
+ "\t mapcol MAP[VARCHAR(STRING),DOUBLE], \n" | ||
+ "\t address STRUCT < \n" | ||
+ "\t NUMBER BIGINT, \n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to add an indent here. I'm fine with punting this to another PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for now I'll leave it as is. We may want to provide unified ux for both web based UI and CLI
|
||
@Test | ||
public void shouldGetCorrectKsqlType() throws Exception { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add a case for INT64
assertThat(schema8.field("COL1").schema().type(), equalTo(Schema.Type.STRING)); | ||
assertThat(schema8.field("COL4").schema().type(), equalTo(Schema.Type.ARRAY)); | ||
assertThat(schema8.field("COL5").schema().type(), equalTo(Schema.Type.MAP)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a nested struct case here.
Statement statement = KSQL_PARSER.buildAst(queryStr, metaStore).get(0); | ||
Assert.assertTrue("testCreateStream failed.", statement instanceof CreateStream); | ||
CreateStream createStream = (CreateStream)statement; | ||
Assert.assertTrue("testCreateStream failed.", createStream.getName().toString().equalsIgnoreCase("ORDERS")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertThat(createStream.getName().toString().toUpper(), equals("ORDERS"))
ditto below
Assert.assertTrue("testCreateStream failed.", statement instanceof CreateStream); | ||
CreateStream createStream = (CreateStream)statement; | ||
Assert.assertTrue("testCreateStream failed.", createStream.getName().toString().equalsIgnoreCase("ORDERS")); | ||
Assert.assertTrue("testCreateStream failed.", createStream.getElements().size() == 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertThat(.., equals(..))
ditto below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One small comment inline.
assertThat(((Struct) type7).getItems().get(5).getRight().getKsqlType(), equalTo(Type.KsqlType | ||
.STRUCT)); | ||
|
||
Type type8 = TypeUtil.getKsqlType(Schema.INT64_SCHEMA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this up alongside the other primitive cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hjafarpour, left some comments
@@ -1614,4 +1588,23 @@ public InvalidColumnReferenceException(String message, Throwable cause) { | |||
super(message, cause); | |||
} | |||
} | |||
|
|||
private static PrimitiveType getPrimitiveType(String typeName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put this as a static method on PrimitiveType
} | ||
|
||
@Override | ||
public int hashCode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentional? i.e., hashcode always 0 and equals always false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, updated the implementations.
|
||
public class Map extends Type { | ||
|
||
final Type keyType = new PrimitiveType(KsqlType.STRING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
?
return visitor.visitMap(this, context); | ||
} | ||
|
||
public Type getKeyType() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need keyType
if it is always String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was more for clarifying it. I removed it.
} | ||
|
||
@Override | ||
public int hashCode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashCode and equals again
} | ||
|
||
public void setParent(Node parent) { | ||
if (parent == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.parent = Optional.ofNullable(parent);
public abstract class Type extends Expression { | ||
|
||
|
||
public enum KsqlType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have an enum for KsqlType
and a class hierarchy extending Type
? IMO it would be better if they enum
was for PrimitiveType
s only and was an internal thing. At the moment we have too many things like Schema, Type, etc that have corresponding XXXXUtil
classes rather than having objects and using polymorphism. We need to address this
import static org.hamcrest.CoreMatchers.instanceOf; | ||
import static org.hamcrest.MatcherAssert.assertThat; | ||
|
||
public class SetParentVisitorTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this test covers everything in SetParentVisitor
public class TypeUtilTest { | ||
|
||
@Test | ||
public void shouldGetCorrectKsqlType() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple tests. Much better to have this broken down into individual tests than one big test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, doesn't throw exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broke it down to multiple tests.
} | ||
|
||
@Test | ||
public void shouldGetCorrectSchema() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto above
Something that @bluemonk3y mentioned and makes a lot of sense... It would be better for the future if we replaces |
@hjafarpour it would make sense to do this before any further work struct based PRs are done. So if it isn't in this PR you probably want to do it in the next one |
@dguy yes, I can do it in a PR after switching to Connect data type. This PR is only for DDL statement and ksql type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
The first part of struct (nested) data type support. This PR adds a new type, STRUCT, and adds support in DDL statements along with DESCRIBE statement.
Here is an example of how you can define a stream/table with struct field:
CREATE STREAM orders (ordertime bigint, orderid bigint, itemid varchar, orderunits double, arraycol array<double>, mapcol map<varchar, double>, address STRUCT < number bigint, street varchar, city varchar, state varchar, zipcode bigint>) WITH (kafka_topic='orders_with_struct_topic_json', key='orderid', value_format='json');
This is the first step to resolve several filed issues including #638 and #824.