-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] cuDF.dtype objects #6160
[WIP] cuDF.dtype objects #6160
Changes from 72 commits
1c83eac
33bd96c
baf138c
bdb87fa
4a3fe71
1cf2c3e
dbc4970
ba42bd8
60272e2
c03be40
7f6cb36
a81c368
572c39f
4f6f316
ee6ece5
62c5e17
139465f
ef5b9cb
9320755
dac2940
6eee9eb
1ace460
df6426b
92d1a64
59b3673
297a31a
e5def6e
b4d344f
22fd5d9
c5a0b62
d47de03
fe180a3
cad48d0
6a1785c
8552907
2b59285
b2851a2
9540643
781b42e
13fe291
3c047ef
a139571
40a1699
ea24184
ddf340b
4a14042
55cec7e
c28c7b6
bad1dc2
0938507
e5a489d
62a7d5b
80baff4
38e11af
22b299d
1552c0a
78caafa
a9fe2fb
455af02
e4c0bf1
42828c0
0d3d6a0
63e1387
2005d65
523919c
c730301
c5450c2
7bc0893
a3a4893
6bf121c
165f86c
cec9528
a8b380b
1dc151a
46a9c2f
81e6058
d7930eb
c290a15
e90e325
3d8ca2f
2653384
123784b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,8 @@ import json | |
from cython.operator import dereference | ||
import numpy as np | ||
|
||
from cudf.utils.dtypes import np_to_pa_dtype, is_categorical_dtype | ||
from cudf.utils.dtypes import np_to_pa_dtype | ||
from cudf.api.types import is_categorical_dtype | ||
from libc.stdlib cimport free | ||
from libc.stdint cimport uint8_t | ||
from libcpp.memory cimport shared_ptr, unique_ptr, make_unique | ||
|
@@ -102,7 +103,6 @@ cpdef generate_pandas_metadata(Table table, index): | |
) | ||
else: | ||
types.append(np_to_pa_dtype(col.dtype)) | ||
|
||
# Indexes | ||
if index is not False: | ||
for name in table._index.names: | ||
|
@@ -134,16 +134,15 @@ cpdef generate_pandas_metadata(Table table, index): | |
index_descriptors.append(descr) | ||
else: | ||
col_names.append(name) | ||
|
||
metadata_df = table.head(0).to_pandas() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this needed because our duck typing no longer works in this situation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I felt like this was a pretty bad hack. The pyarrow function here writes the metadata by subbing in the names of the dtype objects attached to the dataframe. So if we pass it our frame it writes a parquet file that other readers can't read because they don't understand the dtypes. I suspect there's a better way of doing this, but ideally that solution wouldn't need to expect that the pyarrow function works in a specific way, e.g. an actual API for doing this. We might be able to contribute that to pyarrow directly. |
||
metadata = pa.pandas_compat.construct_metadata( | ||
table, | ||
metadata_df, | ||
col_names, | ||
index_levels, | ||
index_descriptors, | ||
index, | ||
types, | ||
) | ||
|
||
md = metadata[b'pandas'] | ||
json_str = md.decode("utf-8") | ||
return json_str | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ from libcpp.memory cimport unique_ptr | |
from libcpp cimport bool | ||
|
||
from cudf._lib.cpp.scalar.scalar cimport scalar | ||
from libc.stdint cimport uintptr_t | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this needed here? It doesn't look to be used in this header. |
||
|
||
|
||
cdef class Scalar: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we already error in trying to construct the
cudf_dtype
object here instead of having to check innp_to_cudf_types
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't need to check the dict, that's correct. The dtype should error upon construction. The only question is if this case deserves its own error. To figure that out I think I need to look closely here and figure out what exactly the user could be doing that would cause them to run into this. More to follow here.