Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support for operations on dataframe with list, dicts, numbers, and strings as datatype #2291

Closed
neileshc opened this issue Jul 15, 2019 · 1 comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@neileshc
Copy link

Is your feature request related to a problem? Please describe.
I wish I could have lists, arrays, dicts, ... in my cudf dataframe.

Describe the solution you'd like
I want cudf let me have lists and arrays and also strings in my cudf and also let me apply functions (apply_row for example) on them. currently apply_row only works with numerical values.

Describe alternatives you've considered
I had to create a much bigger cudf dataframe that contains the flattened version of arrays, tuples, and lists. This is done in Pandas and could be time consuming.

Additional context
code snippet in pandas

test = pd.DataFrame({'key1': ['A', 'B', 'B', 'D'],
'key2': [['E'], ('B'), {'B':0}, 0.123]})
test

           key1       key2

0 A [E]
1 B B
2 B {'B': 0}
3 D 0.123

@neileshc neileshc added Needs Triage Need team to review and classify feature request New feature or request labels Jul 15, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jul 16, 2019
@kkraus14
Copy link
Collaborator

@neileshc Strings and Numbers are supported today. We are working on supporting nested columns (think list) in the future and it is dependent on #2207. Supporting dicts via something dictionary like is something that will be very far down the line if ever supported.

Pandas allows you to use arbitrary Python objects in your DataFrame, but it does this at the cost of a LOT of performance and we will not support arbitrary Python objects at any point in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

2 participants