-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getJsonObject number normalization #1897
Merged
thirtiseven
merged 2 commits into
NVIDIA:get-json-object-feature
from
thirtiseven:get-json-object-number-normalization-inplace
Mar 26, 2024
Merged
getJsonObject number normalization #1897
thirtiseven
merged 2 commits into
NVIDIA:get-json-object-feature
from
thirtiseven:get-json-object-number-normalization-inplace
Mar 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Haoyang Li <[email protected]>
res-life
reviewed
Mar 26, 2024
src/main/cpp/src/get_json_object.hpp
Outdated
: output(nullptr), output_len(0), hide_outer_array_tokens(_hide_outer_array_tokens) | ||
{ | ||
} | ||
CUDF_HOST_DEVICE CUDF_HOST_DEVICE json_generator<>& operator=(const json_generator<>& other) | ||
__device__ __device__ json_generator<>& operator=(const json_generator<>& other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two device
Add several JNI cases will be good. |
Yes, -0000000 is not a valid JSON number. |
This was referenced Mar 26, 2024
Signed-off-by: Haoyang Li <[email protected]>
thirtiseven
added a commit
that referenced
this pull request
Mar 27, 2024
* get-json-object: Add JSON parser and parser utility (#1836) * Add Json Parser; Add Json Parser utility; Define internal interfaces; Copy get-json-obj CUDA code from cuDF; Signed-off-by: Chong Gao <[email protected]> * Code format --------- Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]> * get-json-object: match current field name (#1857) Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]> * get-json-object: add utility write_escaped_text for JSON generator (#1863) Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]> * Add JNI for GetJsonObject (#1862) * Add JNI for GetJsonObject Signed-off-by: Haoyang Li <[email protected]> * clean up Signed-off-by: Haoyang Li <[email protected]> * Parse json path in plugin Signed-off-by: Haoyang Li <[email protected]> * Apply suggestions from code review Co-authored-by: Nghia Truong <[email protected]> * Use table_view Signed-off-by: Haoyang Li <[email protected]> * Update java Signed-off-by: Haoyang Li <[email protected]> * Apply suggestions from code review Co-authored-by: Nghia Truong <[email protected]> * clean up Signed-off-by: Haoyang Li <[email protected]> * use matched enum for type Signed-off-by: Haoyang Li <[email protected]> * clean up Signed-off-by: Haoyang Li <[email protected]> * upmerge Signed-off-by: Haoyang Li <[email protected]> * format Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]> Co-authored-by: Nghia Truong <[email protected]> * get-json-object: main flow (#1868) Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]> * Optimize memory usage in match_current_field_name (#1889) * Optimize match_current_field_name using less memory Signed-off-by: Chong Gao <[email protected]> * Convert a function to device code * Add a JNI test case * Add JNI test case * Change nesting depth to 4 * Change nesting depth to 8 to fix test Signed-off-by: Haoyang Li <[email protected]> * remove clang format change Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Chong Gao <[email protected]> Signed-off-by: Haoyang Li <[email protected]> Co-authored-by: Chong Gao <[email protected]> * get-json-object: Recursive to iterative (#1890) * Change recursive to iterative Signed-off-by: Chong Gao <[email protected]> --------- Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]> * Fix bug * Format * Use uppercase for path_instruction_type Signed-off-by: Haoyang Li <[email protected]> * Add test cases from Baidu * Fix escape char error; add test case * getJsonObject number normalization (#1897) * Support number normalization Signed-off-by: Haoyang Li <[email protected]> * delete cpp test and add a java test case Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]> * Add test case * Fix a escape/unescape size bug Signed-off-by: Haoyang Li <[email protected]> * Fix bug: handle leading zeros for number; Refactor * Apply suggestions from code review Co-authored-by: Nghia Truong <[email protected]> * Address comments Signed-off-by: Haoyang Li <[email protected]> * fix java test Signed-off-by: Haoyang Li <[email protected]> * Add test cases; Fix a bug * follow up escape/unescape bug fix Signed-off-by: Haoyang Li <[email protected]> * Minor refactor * Add a case; Fix bug --------- Signed-off-by: Chong Gao <[email protected]> Signed-off-by: Haoyang Li <[email protected]> Co-authored-by: Chong Gao <[email protected]> Co-authored-by: Haoyang Li <[email protected]> Co-authored-by: Nghia Truong <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #1831
This PR supports number normalization in getJsonObject.
In getJsonObject in Spark, a float number is converted to a double when parsing and then converted to a string when returning. This PR uses
stod
in cudf andftos_converter
in jni to simulate this behavior. There may be some compatibility issues inftos_converter
because it is based on ryu, but it is acceptable because the difference is very minor. We use this solution to support double to string in spark-rapids as well.For int number, the only special case I know is "-0" is normalized to "0". Note that "-0000000" is invalid as data.
We need to merge it into the feature branch after removing the cpu tests, because this pr calls device-only functions.