-
Notifications
You must be signed in to change notification settings - Fork 148
METAMODEL-1225 Add ColumnTypingStrategies and use them in CSV module. #240
base: master
Are you sure you want to change the base?
Conversation
There are two strategies for column typing, all text and manual. The CSV module can use manual typing to filter based on numerical greater than and less than operations.
As mentioned in the JIRA comment, let me know if there is too much duplication between the typing package and naming package. |
@@ -492,7 +490,7 @@ public void testMaterializeTable() throws Exception { | |||
public void testAlternativeDelimitors() throws Exception { | |||
File file = new File("src/test/resources/csv_semicolon_singlequote.csv"); | |||
CsvDataContext dc = new CsvDataContext(file, semicolonConfiguration); | |||
Table table = dc.getSchemas().get(0).getTables().get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tripped me up for a while, it turns out getSchemas().get(0) returns the schema for information_schema while getDefaultSchema() returns the schema for specified resource. Now that the typing on the schema matters for reads dataSet.next() includes a cast which was failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, I'm especially happy to see that you've aimed for consistency with previous work in the project. The "session" concept strikes me as a bit odd TBH, but it's consistent so I don't feel like I want to advocate against it :-) A few remarks here and there.
core/src/main/java/org/apache/metamodel/schema/typing/ColumnTypingContext.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/metamodel/schema/typing/CustomColumnTypingStrategy.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the most part. I did mostly have code style related comments.
*/ | ||
package org.apache.metamodel.schema.typing; | ||
|
||
import org.apache.metamodel.schema.ColumnType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import org.apache.metamodel.schema.ColumnType is never used.
* Defines the context for configuring the type for a single column in a | ||
* {@link ColumnTypingStrategy} session. | ||
*/ | ||
public interface ColumnTypingContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have both a ColumnTypingContext
interface and a ColumnTypingContextImpl
. In this case you have exactly one implementation of an interface, for which I don't expect you want to create different implementations in the future, so I would personally drop the interface and rename the implementation to ColumnTypingContext
.
*/ | ||
package org.apache.metamodel.schema.typing; | ||
|
||
import org.apache.metamodel.schema.ColumnType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import org.apache.metamodel.schema.ColumnType is never used.
* @param table The table that contains the column | ||
* @param columnIndex the index in the table of the column being configured. | ||
*/ | ||
public ColumnTypingContextImpl( Table table, int columnIndex ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a styling thing, but normally we don't put spaces after and before parentheses in constructor and method definitions and calls, also we try to promote making parameters (and variables) final when possible, so then this constructor definition would look like this:
public ColumnTypingContextImpl(final Table table, final int columnIndex) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will go back and double check the space padding in the rest of the files. My work uses different formatting rules and I'm not used to handling multiple rulesets in my IDE.
I agree with making parameters final when possible. Does this also apply to parameters of lambda functions? for example an instance of the ColumnTypingSession after adding the default implementation to close:
return (final ColumnTypingContext ctx) -> {
if ( iterator.hasNext() ) {
return iterator.next();
}
return null;
};
or
return ctx -> {
if ( iterator.hasNext() ) {
return iterator.next();
}
return null;
};
Personally I treat parameters as final whether they are marked as such or not, and I don't see any final lambda parameters in the project so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a valid point, because with the second example you're able to assign a new Object (if you were able to instantiate ColumnTypingContext
) to the ctx
parameter. Even though I wouldn't consider doing that.
It's just that in practice I always see the second example (the one you implemented), so for lambdas let's not start adding finals (and types) to the formal parameters now.
* Ends the column typing session. | ||
*/ | ||
@Override | ||
void close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the close
method is already defined on the Closeable
interface, it's not necessary to define it here again. You could change this to a default implementation of the close
method, i.e.:
@Override
default void close() {
}
In this manner you wouldn't have to implement it on the anonymous inner classes returned by the startColumnTypingSession
on the CustomColumnTypingStrategy
and DefaultColumnTypingStrategy
classes.
* | ||
* @return a strategy that defaults to text | ||
*/ | ||
public static ColumnTypingStrategy defaultStrategy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java method names are by convention preferably verbs, so then this should be getDefaultStrategy
and the ones below should be getCustomStrategy
.
@Test | ||
public void testTypeConfiguration() throws Exception { | ||
ColumnTypingStrategy strategy = ColumnTypingStrategies.customTypes(ColumnType.STRING, ColumnType.NUMBER); | ||
try ( final ColumnTypingSession session = strategy.startColumnTypingSession() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a try with resources statement the resource is implicitly final, so here you can remove the final
modifier.
@@ -22,8 +22,11 @@ | |||
import org.apache.metamodel.factory.AbstractDataContextFactory; | |||
import org.apache.metamodel.factory.DataContextProperties; | |||
import org.apache.metamodel.factory.ResourceFactoryRegistry; | |||
import org.apache.metamodel.schema.ColumnType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import org.apache.metamodel.schema.ColumnType is never used.
@@ -833,8 +831,14 @@ public void testCustomColumnNames() throws Exception { | |||
final String thirdColumnName = "third"; | |||
final String fourthColumnName = "fourth"; | |||
|
|||
final ColumnType firstColumnType = ColumnType.STRING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it adds real value to put these constants into the xxxColumnType variables, especially since they're only used once. It doesn't make the code more understandable or more elegant.
|
||
public void testCustomTyping() throws Exception { | ||
ColumnType columnType1 = ColumnType.DATE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here as above. In my opinion just using the constant is more clear than adding it to a variable which doesn't add a meaning to it.
@kaspersorensen You had some comments on this pull requests, which mark it with "changes requested". Have these been addressed? If so, can we merge the pull request? |
Sorry guys @LiveSatomi @arjansh , had been buried in work for a bit. Gave this my approval now :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look at this pull request, to see why it hadn't been merged yet. I had one more comment. Once that's fixed, I'll merge it. I'd like to apologize that it took me this long to get back to this pull request.
char separatorChar, char quoteChar, char escapeChar, boolean failOnInconsistentRowLength, | ||
boolean multilineValues) { | ||
|
||
public CsvConfiguration(int columnNameLineNumber, ColumnNamingStrategy columnNamingStrategy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By replacing the old public constructor with this constructor, compile problems can occur in other projects using this library. Can you please retain the old constructor and have it invoke this constructor with a null
value for the ColumnTypeStrategy
parameter?
There are two strategies for column typing, all text and manual.
The CSV module can use manual typing to filter based on numerical
greater than and less than operations.