-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Initial work for decimal type in Java/JNI [skip ci] #6514
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is still WIP, but I wanted to call out a few things.
@@ -38,10 +38,12 @@ | |||
|
|||
protected final OffHeapState offHeap; | |||
protected final DType type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we contain DataType
instead of DType
+ scale
in here? And we need implement getDecimal
. I think we can change it in subsequent PR.
@@ -3729,6 +3767,14 @@ public static ColumnVector fromDoubles(double... values) { | |||
return build(DType.FLOAT64, values.length, (b) -> b.appendArray(values)); | |||
} | |||
|
|||
/* public static ColumnVector fromDecimal32(int... values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
involve this commented codes later till we are fully ready?
Signed-off-by: Niranjan Artal <[email protected]>
I think the code in |
If it is simple to do yes, but if it is not simple we should look at only supporting the newer one that supports nested types and deprecating the older one. No point in maintaing two APIs that do the same thing. |
@revans2 @jlowe It would be great if you could take a look and let me know if this is something in the right direction. As of now I am making changes and making sure we don't break any current tests. The refactor is in different places so would be helpful if I could get early feedback on whether to continue or if we need to pivot. |
@@ -3088,7 +3114,7 @@ public long getColumnViewAddress() { | |||
|
|||
@Override | |||
public ColumnViewAccess getChildColumnViewAccess(int childIndex) { | |||
if (!type.isNestedType()) { | |||
if (!type.typeId.isNestedType()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would parrot @jlowe 's comment here to add the isNestedType
to DataType to reduce code changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -1239,48 +1281,48 @@ public Builder appendUTF8String(byte[] value, int offset, int length) { | |||
|
|||
public Builder appendArray(byte... values) { | |||
assert (values.length + currentIndex) <= rows; | |||
assert type == DType.INT8 || type == DType.UINT8 || type == DType.BOOL8; | |||
data.setBytes(currentIndex * type.sizeInBytes, values, 0, values.length); | |||
assert type.typeId == DType.INT8 || type.typeId == DType.UINT8 || type.typeId == DType.BOOL8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are making a change this would be a good time to clean this up and have a helper method in DataType like the one we have e.g. 'isTimestamp` etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
currentIndex += values.length; | ||
return this; | ||
} | ||
|
||
public Builder appendArray(short... values) { | ||
assert type == DType.INT16 || type == DType.UINT16; | ||
assert type.typeId == DType.INT16 || type.typeId == DType.UINT16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here. can we have a helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
/** | ||
* Get the value at index. | ||
*/ | ||
public final short getShort(long index) { | ||
assert type == DType.INT16 || type == DType.UINT16 : type + " is not stored as a short."; | ||
assert type.typeId == DType.INT16 || type.typeId == DType.UINT16 : type + " is not stored as a short."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here. have a helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -273,28 +286,28 @@ private void assertsForGet(long index) { | |||
* Get the value at index. | |||
*/ | |||
public byte getByte(long index) { | |||
assert type == DType.INT8 || type == DType.UINT8 || type == DType.BOOL8 : type + | |||
assert type.typeId == DType.INT8 || type.typeId == DType.UINT8 || type.typeId == DType.BOOL8 : type + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here. have a helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
try { | ||
cudf::jni::auto_set_device(env); | ||
cudf::column_view *column = reinterpret_cast<cudf::column_view *>(handle); | ||
return static_cast<jint>(column->type().scale()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is type checking needed before calling scale()
on type_id? will it blow up in case its not a decimal type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not need the cast and I would prefer to not have it because it shows that we don't have issues mapping from the native scale to the JNI representation.
I had an idea and I really would like some feedback on it. I think we can make a change that is mostly source compatible, but not binary compatible. What if we renamed i.e.
We could also verify that if you try to create an object from just a nativeId and it is a DECIMAL* we can blow up. |
I was already thinking along these lines, but I hadn't considered taking over I'm still worried about all the places code assumes checking for equality on a data type can be performed with |
Hi @revans2 @jlowe , @nartal1 and I are trying to implement the idea public static final DType EMPTY = new DType(DTypeEnum.EMPTY);
public static final DType INT8 = new DType(DTypeEnum.INT8);
public static final DType INT16 = new DType(DTypeEnum.INT16);
public static final DType INT32 = new DType(DTypeEnum.INT32);
public static final DType INT64 = new DType(DTypeEnum.INT64);
public static final DType UINT8 = new DType(DTypeEnum.UINT8);
public static final DType UINT16 = new DType(DTypeEnum.UINT16);
public static final DType UINT32 = new DType(DTypeEnum.UINT32);
public static final DType UINT64 = new DType(DTypeEnum.UINT64);
public static final DType FLOAT32 = new DType(DTypeEnum.FLOAT32);
public static final DType FLOAT64 = new DType(DTypeEnum.FLOAT64);
public static final DType BOOL8 = new DType(DTypeEnum.BOOL8);
public static final DType TIMESTAMP_DAYS = new DType(DTypeEnum.TIMESTAMP_DAYS);
public static final DType TIMESTAMP_SECONDS = new DType(DTypeEnum.TIMESTAMP_SECONDS);
public static final DType TIMESTAMP_MILLISECONDS = new DType(DTypeEnum.TIMESTAMP_MILLISECONDS);
public static final DType TIMESTAMP_MICROSECONDS = new DType(DTypeEnum.TIMESTAMP_MICROSECONDS);
public static final DType TIMESTAMP_NANOSECONDS = new DType(DTypeEnum.TIMESTAMP_NANOSECONDS);
public static final DType DURATION_DAYS = new DType(DTypeEnum.DURATION_DAYS);
public static final DType DURATION_SECONDS = new DType(DTypeEnum.DURATION_SECONDS);
public static final DType DURATION_MILLISECONDS = new DType(DTypeEnum.DURATION_MILLISECONDS);
public static final DType DURATION_MICROSECONDS = new DType(DTypeEnum.DURATION_MICROSECONDS);
public static final DType DURATION_NANOSECONDS = new DType(DTypeEnum.DURATION_NANOSECONDS);
public static final DType STRING = new DType(DTypeEnum.STRING);
public static final DType LIST = new DType(DTypeEnum.LIST);
public static final DType STRUCT = new DType(DTypeEnum.STRUCT);
private static final DType[] staticDTypes = new DType[] {
EMPTY,
INT8,
INT16,
INT32,
INT64,
UINT8,
UINT16,
UINT32,
UINT64,
FLOAT32,
FLOAT64,
BOOL8,
TIMESTAMP_DAYS,
TIMESTAMP_SECONDS,
TIMESTAMP_MILLISECONDS,
TIMESTAMP_NANOSECONDS,
DURATION_DAYS,
DURATION_SECONDS,
DURATION_MILLISECONDS,
DURATION_MICROSECONDS,
DURATION_NANOSECONDS,
null, // DICTIONARY32
STRING,
LIST,
null, // DECIMAL32
null, // DECIMAL64
STRUCT
};
public static DType fromNative(int nativeId, int scale) {
if (nativeId >=0 && nativeId < staticDTypes.length) {
if (staticDTypes[nativeId] != null) {
return staticDTypes[nativeId];
}
if (nativeId == DTypeEnum.DECIMAL32.nativeId) {
// TODO: maybe we can verify scale at here?
return new DType(DTypeEnum.DECIMAL32, scale);
}
if (nativeId == DTypeEnum.DECIMAL64.nativeId) {
// TODO: maybe we can verify scale at here?
return new DType(DTypeEnum.DECIMAL64, scale);
}
}
throw new IllegalArgumentException("Could not translate " + nativeId + " into a DType");
}
public static DType fromNative(int nativeId) {
// TODO: replace Integer.MIN_VALUE with a constant as noop placeholder ?
return fromNative(nativeId, Integer.MIN_VALUE);
} |
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
Thanks to @sperlingxx for adding scalar tests. I have merged it in this branch. Apologies for bringing up this late. Thought of adding this here to have some tests. |
Signed-off-by: Niranjan Artal <[email protected]>
*/ | ||
#pragma once | ||
|
||
namespace dtype_utils{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is really minor because all of the functions are inline and the header is private to the JNI code, but I would prefer to have the namespace have something with cudf and java/jni in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nartal1, here is an additional (non-actionable) note on @revans2's namespace
comment:
These functions used to be in the "unnamed" namespace in ColumnVectorJni.cpp
. Moving it to a separate header and out of the unnamed namespace forced external linkage, and led to the multiple definitions error you saw. This forced your hand with making them inline
. You could have alternatively chosen to keep them in the "unnamed" namespace (nested within another cudf::jni
namespace).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mythrocks , I have made the changes as per suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes suggested.
*/ | ||
#pragma once | ||
|
||
namespace dtype_utils{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nartal1, here is an additional (non-actionable) note on @revans2's namespace
comment:
These functions used to be in the "unnamed" namespace in ColumnVectorJni.cpp
. Moving it to a separate header and out of the unnamed namespace forced external linkage, and led to the multiple definitions error you saw. This forced your hand with making them inline
. You could have alternatively chosen to keep them in the "unnamed" namespace (nested within another cudf::jni
namespace).
Signed-off-by: Niranjan Artal <[email protected]>
@jlowe @mythrocks I think I have addressed review comments. Please take another look. |
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
Thanks for the approvals! |
This partially addresses tasks from issue #6515.
The goal of this PR is to add DType class and make the necessary changes in Java/JNI code to support scale in DType.