Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct Data Type(Part-1): New struct type, DDL statements and Describe #1114

Merged
merged 63 commits into from
May 15, 2018

Conversation

hjafarpour
Copy link
Contributor

The first part of struct (nested) data type support. This PR adds a new type, STRUCT, and adds support in DDL statements along with DESCRIBE statement.
Here is an example of how you can define a stream/table with struct field:

CREATE STREAM orders (ordertime bigint, orderid bigint, itemid varchar, orderunits double, arraycol array<double>, mapcol map<varchar, double>, address STRUCT < number bigint, street varchar, city varchar, state varchar, zipcode bigint>) WITH (kafka_topic='orders_with_struct_topic_json', key='orderid', value_format='json');

This is the first step to resolve several filed issues including #638 and #824.

hjafarpour and others added 30 commits January 29, 2018 12:45
get rid of String encoded values.
registered -> boolean
replicaInfo -> list<int> instead of string encoding that info
partitionCount -> gone since it is encoded in replicaInfo
…rrides (confluentinc#924)

This includes one minor fix to the MetricsCollector to make it correctly
robust to reuse across tests that are both aware of its initialization/cleanup
and those that are not.
* reduce checkstyle suppressions

* remove commented out suppression
clusterId metadata was introduced in Kafka 0.10.1
@hjafarpour hjafarpour self-assigned this Apr 3, 2018
@hjafarpour hjafarpour requested a review from a team April 3, 2018 22:00
Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass through. Looks good. Left some comments inline.

return "MAP[" + getSchemaFieldType(field.schema().keySchema().fields().get(0)) + ","
+ getSchemaFieldType(field.schema().valueSchema().fields().get(0)) + "]";
} else if (field.schema().type() == Schema.Type.STRUCT) {
StringBuilder stringBuilder = new StringBuilder("STRUCT <");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringBuilder.append(
field.schema().fields().stream()
.map(getSchemaFieldType)
.collect(Collectors.joining(", ")));

@@ -263,6 +263,10 @@ primaryExpression
| POSITION '(' valueExpression IN valueExpression ')' #position
| '(' expression (',' expression)+ ')' #rowConstructor
| ROW '(' expression (',' expression)* ')' #rowConstructor
| '(' expression (',' expression)+ ')'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as rowConstructor, which seems bad. Isn't this ambiguous? Kind of surprising antlr doesn't raise an error. I guess its working fine since we're not using row constructors anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need ROW. Removed it!

@@ -263,6 +263,10 @@ primaryExpression
| POSITION '(' valueExpression IN valueExpression ')' #position
| '(' expression (',' expression)+ ')' #rowConstructor
| ROW '(' expression (',' expression)* ')' #rowConstructor
| '(' expression (',' expression)+ ')'
#structConstructor
| STRUCT '(' expression (',' expression)* ')'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is here for when we want to create structs on the fly? Is this what the final syntax is going to look like? Maybe leave it out until that's decided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can decide about this later. Removed it.


public class Map extends Type {

final Type keyType = new PrimitiveType(KsqlType.STRING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the key always STRING?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key for map is always string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I didn't know avro keys were restricted to strings too. Makes sense. We should raise an error if a user has a ddl statement with MAP<not-STRING,...> in it. Or change the grammar to just take a value parameter.

@@ -362,4 +372,12 @@ protected R visitInsertInto(InsertInto node, C context) {
process(node.getQuery(), context);
return null;
}

protected R visitCreateStreamAsSelect(CreateStreamAsSelect node, C context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious - is this used anywhere? the analyzers seem to just operate on Query nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a visitor that can be used by creating subclasses of it. So even though we may not use it now we may have future uses for it.

+ "\t arraycol ARRAY[DOUBLE], \n"
+ "\t mapcol MAP[VARCHAR(STRING),DOUBLE], \n"
+ "\t address STRUCT < \n"
+ "\t NUMBER BIGINT, \n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to add an indent here. I'm fine with punting this to another PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for now I'll leave it as is. We may want to provide unified ux for both web based UI and CLI


@Test
public void shouldGetCorrectKsqlType() throws Exception {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add a case for INT64

assertThat(schema8.field("COL1").schema().type(), equalTo(Schema.Type.STRING));
assertThat(schema8.field("COL4").schema().type(), equalTo(Schema.Type.ARRAY));
assertThat(schema8.field("COL5").schema().type(), equalTo(Schema.Type.MAP));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a nested struct case here.

Statement statement = KSQL_PARSER.buildAst(queryStr, metaStore).get(0);
Assert.assertTrue("testCreateStream failed.", statement instanceof CreateStream);
CreateStream createStream = (CreateStream)statement;
Assert.assertTrue("testCreateStream failed.", createStream.getName().toString().equalsIgnoreCase("ORDERS"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat(createStream.getName().toString().toUpper(), equals("ORDERS"))
ditto below

Assert.assertTrue("testCreateStream failed.", statement instanceof CreateStream);
CreateStream createStream = (CreateStream)statement;
Assert.assertTrue("testCreateStream failed.", createStream.getName().toString().equalsIgnoreCase("ORDERS"));
Assert.assertTrue("testCreateStream failed.", createStream.getElements().size() == 7);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat(.., equals(..))
ditto below

Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One small comment inline.

assertThat(((Struct) type7).getItems().get(5).getRight().getKsqlType(), equalTo(Type.KsqlType
.STRUCT));

Type type8 = TypeUtil.getKsqlType(Schema.INT64_SCHEMA);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this up alongside the other primitive cases.

Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hjafarpour, left some comments

@@ -1614,4 +1588,23 @@ public InvalidColumnReferenceException(String message, Throwable cause) {
super(message, cause);
}
}

private static PrimitiveType getPrimitiveType(String typeName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put this as a static method on PrimitiveType

}

@Override
public int hashCode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional? i.e., hashcode always 0 and equals always false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, updated the implementations.


public class Map extends Type {

final Type keyType = new PrimitiveType(KsqlType.STRING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private ?

return visitor.visitMap(this, context);
}

public Type getKeyType() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need keyType if it is always String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more for clarifying it. I removed it.

}

@Override
public int hashCode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashCode and equals again

}

public void setParent(Node parent) {
if (parent == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.parent = Optional.ofNullable(parent);

public abstract class Type extends Expression {


public enum KsqlType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have an enum for KsqlType and a class hierarchy extending Type? IMO it would be better if they enum was for PrimitiveTypes only and was an internal thing. At the moment we have too many things like Schema, Type, etc that have corresponding XXXXUtil classes rather than having objects and using polymorphism. We need to address this

import static org.hamcrest.CoreMatchers.instanceOf;
import static org.hamcrest.MatcherAssert.assertThat;

public class SetParentVisitorTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this test covers everything in SetParentVisitor

public class TypeUtilTest {

@Test
public void shouldGetCorrectKsqlType() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple tests. Much better to have this broken down into individual tests than one big test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, doesn't throw exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke it down to multiple tests.

}

@Test
public void shouldGetCorrectSchema() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto above

@dguy
Copy link
Contributor

dguy commented May 11, 2018

Something that @bluemonk3y mentioned and makes a lot of sense... It would be better for the future if we replaces GenericRow with Struct, i.e., why should GenericRow be special? It is really just Struct all the way down.

@hjafarpour
Copy link
Contributor Author

@dguy yes, that's a good point and we can do it. I created a github issue to track it: #1304
We can implement it in a separate PR.

@dguy
Copy link
Contributor

dguy commented May 14, 2018

@hjafarpour it would make sense to do this before any further work struct based PRs are done. So if it isn't in this PR you probably want to do it in the next one

@hjafarpour
Copy link
Contributor Author

@dguy yes, I can do it in a PR after switching to Connect data type. This PR is only for DDL statement and ksql type.

Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@hjafarpour hjafarpour merged commit e5658dc into confluentinc:master May 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants