Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] How to use cudf.Series() and get_json_object() #14102

Closed
zlwu92 opened this issue Sep 13, 2023 · 4 comments
Closed

[QST] How to use cudf.Series() and get_json_object() #14102

zlwu92 opened this issue Sep 13, 2023 · 4 comments
Labels
question Further information is requested

Comments

@zlwu92
Copy link

zlwu92 commented Sep 13, 2023

Does the API get_json_object() described in https://docs.rapids.ai/api/cudf/legacy/api_docs/api/cudf.core.column.string.stringmethods.get_json_object/
supports the json query?

I installed rapids cudf libraries via conda following the instructions in rapids site.

Follow the example in the above link, I write python code like below

image

But it gives an error like this? What is the problem?
image

@zlwu92 zlwu92 added Needs Triage Need team to review and classify question Further information is requested labels Sep 13, 2023
@davidwendt
Copy link
Contributor

The str.get_json_object() only works on strings column data. The s object in your example is created as a struct type.

>>> s = cudf.Series({"employee": {"name": "sonoo", "salary": 56000, "married": "true"}})
>>> print(s)
employee    {'married': 'true', 'name': 'sonoo', 'salary':...
dtype: struct

You can use the struct.field() function to access the name as follows:

>>> s.struct.field('name')
employee    sonoo
dtype: object

Reference: https://docs.rapids.ai/api/cudf/legacy/api_docs/api/cudf.core.column.struct.structmethods.field/#cudf.core.column.struct.StructMethods.field

Hopefully that helps.

@zlwu92
Copy link
Author

zlwu92 commented Sep 15, 2023

The str.get_json_object() only works on strings column data. The s object in your example is created as a struct type.

>>> s = cudf.Series({"employee": {"name": "sonoo", "salary": 56000, "married": "true"}})
>>> print(s)
employee    {'married': 'true', 'name': 'sonoo', 'salary':...
dtype: struct

You can use the struct.field() function to access the name as follows:

>>> s.struct.field('name')
employee    sonoo
dtype: object

Reference: https://docs.rapids.ai/api/cudf/legacy/api_docs/api/cudf.core.column.struct.structmethods.field/#cudf.core.column.struct.StructMethods.field

Hopefully that helps.

@zlwu92 zlwu92 closed this as completed Sep 15, 2023
@zlwu92
Copy link
Author

zlwu92 commented Sep 15, 2023

Ok. Thank you. Can you share me a simple example created as a strings column data object?
Actually I got error in the example provided in https://docs.rapids.ai/api/cudf/legacy/api_docs/api/cudf.core.column.string.stringmethods.get_json_object/

@davidwendt
Copy link
Contributor

The doc is a bit confusing since the formatter would have issues with the three double-quotes """.
Here is the example perhaps in a better format:

>>> s = cudf.Series([ """ {
... "store":{ "book":[
...   { "category":"reference",
...     "author":"Nigel Rees",
...     "title":"Sayings of the Century",
...     "price":8.95 },
...   { "category":"fiction",
...     "author":"Evelyn Waugh",
...     "title":"Sword of Honour",
...     "price":12.99 }
... ] } } """ ])
>>> s
0     {\n"store":{ "book":[\n  { "category":"refere...
dtype: object
>>> s.str.get_json_object("$.store.book")
0    [\n  { "category":"reference",\n    "author":"...
dtype: object
>>> s.str.get_json_object("$.store.book[0].author")
0    Nigel Rees
dtype: object

@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants