-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reject non-string keys #1004
Reject non-string keys #1004
Conversation
* as keys. | ||
*/ | ||
private void assertKeyType(TypeDescription schema) { | ||
TypeDescription keyType = schema.getChildren().get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it guarantee that schema has Children? And what are each children schema representing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we have an ORC schema like map<string,int>
then the first child is a TypeDescription
instance representing string
type. Not sure if schema
is guaranteed to have at least one child. We might want to test with a malformed ORC schema like map<>
. It may or may not reach this point when such schema is given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also make sure the passed-in 'schema' argument really represents the map schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also make sure the passed-in 'schema' argument really represents the map schema
It certainly wouldn't hurt to do so, but I feel like this is somewhat redundant as it is already ensured by VectorColumnFilter.createConverter()
. Its very logic doesn't leave any room for other types to fall into MapColumnConverter
.
public static JsonConverter createConverter(TypeDescription schema) {
switch (schema.getCategory()) {
// ...
case MAP:
return new MapColumnConverter(schema);
default:
throw new IllegalArgumentException("Unhandled type " + schema);
}
}
If you still insist doing so, I would suggest to add some common type checking logic in JsonConverter
interface or in VectorColumnFiller
class so that other types could also benefit from it. Any thoughts on this?
private void assertKeyType(TypeDescription schema) { | ||
TypeDescription keyType = schema.getChildren().get(0); | ||
String keyTypeName = keyType.getCategory().getName(); | ||
if (!keyTypeName.equals("string")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be case insensitive comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a good point. Not sure if ORC schema is case sensitive. But it would be a good idea to perform case insensitive comparison here just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested what happens when ORC schema is defined with upper-case letters (e.g., map<INT,int>
). When reached at MapColumnConverter
, the schema
shows the schema information as map<int,int>
. Something in the middle automatically converts it to lower-case letters. However, I'll stick with a case-insensitive comparison, as you suggested, for extra safety.
Thanks for the review. I’ll come up with a better way to handle these concerns mentioned above and revise code accordingly. |
lgtm |
As far as I understand, JSON only allows string keys. However, the current implementation of
MapColumnConverter
allows arbitrary types as map keys which potentially cause issues forJsonConverter
. This PR adds a safeguard that rejects non-string map keys.