Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getJsonObject: improve performance for new version of getJsonObject; fix a bug #1930

Merged
merged 13 commits into from
Apr 11, 2024

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Apr 7, 2024

closes #1907

Improve performance for new version of getJsonObject.
Fix a redefined variable

perf test

Tested a customer data.
The first line is cuDF impl.

After commit time(1st,2nd,3rd) time / original cuDF impl (less is better)
original cuDF impl, branch 24.02 2183 ms, 778 ms, 756 ms base 1.0
new version of get-json-obj in branch 24.04 21230 ms, 19887 ms, 20150 ms 16.482916330374
Refactor: Move device code in get_json_object to cu or cuh files 21278 ms, 19917 ms, 20366 ms 16.5620123755717
Refactor/simplify json generator 6113 ms, 4734 ms, 4690 ms 4.17998385794996
Remove purge non-empty nulls step 6064 ms, 4747 ms, 4753 ms 4.18724778046812
Refactor josn parser: remove useless variables 6013 ms, 4687 ms, 4752 ms 4.15711595372612
Use 64 bits to store JSON nest depth context to save memory 6086 ms, 4695 ms, 4709 ms 4.16733925208502
Refine push/pop/peek logic 2850 ms, 1570 ms, 1595 ms 1.6182405165456
Refactor: use less functions; change max path length from 32 to 8 2716, 1584, 1515 1.56443368307775

Signed-off-by: Chong Gao [email protected]

@res-life
Copy link
Collaborator Author

res-life commented Apr 7, 2024

Note: currently meets a error in QA's env:

ai.rapids.cudf.CudaFatalException: parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

@res-life res-life force-pushed the fix-get-json-object-24.04 branch from 1bdbf57 to 5e315f9 Compare April 8, 2024 13:54
@res-life res-life requested a review from ttnghia April 8, 2024 13:55
@res-life
Copy link
Collaborator Author

res-life commented Apr 9, 2024

Please refer to the last commit, it fix nvbug: Fix nvbug: get-json-obj get incorrect result on some GPUs(H100, V100).
Seems it's related to thrust::tie, on some GPU(H100 V100) CUDA(12.x), tie generate whong result(empty output). And on T4, does not occur. It's weird issue.
After changed to tie a structure instead of cudf::size_type, then generate correct answer.
Currently do not know why this fix works.

@ttnghia
Copy link
Collaborator

ttnghia commented Apr 9, 2024

Please try reverting get_json_object.cu in the latest commit and apply this:

@@ -1165,41 +1165,28 @@ __device__ thrust::pair<bool, size_t> get_json_object_single(
   char* out_buf,
   size_t out_buf_size)
 {
-  char* actual_output;
-  if (nullptr == out_buf) {
-    // First step: preprocess sizes
-    actual_output = out_buf;
-  } else {
-    // Second step: writes output
-    // if output buf size is zero, pass in nullptr to avoid generator writing trash output
-    actual_output = (0 == out_buf_size) ? nullptr : out_buf;
-  }
-
   json_parser j_parser(input, input_len);
-  json_generator generator(actual_output);
-
   j_parser.next_token();
   // JSON validation check
   if (json_token::ERROR == j_parser.get_current_token()) { return {false, 0}; }

-  if (nullptr == out_buf) {
-    // First step: preprocess sizes
-    bool success = evaluate_path(
-      j_parser, generator, write_style::raw_style, path_commands_ptr, path_commands_size);
+  // First pass: preprocess sizes.
+  // Second pass: writes output.
+  // The generator automatically determines which pass based on `out_buf`.
+  // If `out_buf_size` is zero, pass in `nullptr` to avoid generator writing trash output.
+  json_generator generator((out_buf == nullptr || out_buf_size == 0) ? nullptr : out_buf);
 
-    if (!success) {
-      // generator may contain trash output, e.g.: generator writes some output,
-      // then JSON format is invalid, the previous output becomes trash.
-      // set output as zero to tell second step
-      generator.set_output_len_zero();
-    }
-    return {success, generator.get_output_len()};
-  } else {
-    // Second step: writes output
-    bool success = evaluate_path(
-      j_parser, generator, write_style::raw_style, path_commands_ptr, path_commands_size);
-    return {success, generator.get_output_len()};
+  bool const success = evaluate_path(
+    j_parser, generator, write_style::raw_style, path_commands_ptr, path_commands_size);
+
+  if (nullptr == out_buf && !success) {
+    // generator may contain trash output, e.g.: generator writes some output,
+    // then JSON format is invalid, the previous output becomes trash.
+    // set output as zero to tell second step
+    generator.set_output_len_zero();
   }
+
+  return {success, generator.get_output_len()};
 }


@@ -1238,21 +1227,20 @@ __launch_bounds__(block_size) CUDF_KERNEL
   while (tid < col.size()) {
     bool is_valid               = false;
     cudf::string_view const str = col.element<cudf::string_view>(tid);
-    cudf::size_type output_size = 0;
     if (str.size_bytes() > 0) {
       char* dst = out_buf != nullptr ? out_buf + output_offsets[tid] : nullptr;
       size_t const dst_size =
         out_buf != nullptr ? output_offsets[tid + 1] - output_offsets[tid] : 0;
 
       // process one single row
-      auto [result, output_size] = get_json_object_single(
+      auto const [result, output_size] = get_json_object_single(
         str.data(), str.size_bytes(), path_commands_ptr, path_commands_size, dst, dst_size);
       if (result) { is_valid = true; }
-    }
 
-    // filled in only during the precompute step. during the compute step, the
-    // offsets are fed back in so we do -not- want to write them out
-    if (out_buf == nullptr) { d_sizes[tid] = output_size; }
+      // filled in only during the precompute step. during the compute step, the
+      // offsets are fed back in so we do -not- want to write them out
+      if (out_buf == nullptr) { d_sizes[tid] = static_cast<cudf::size_type>(output_size); }
+    }

@@ -13,7 +13,6 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be reverted.

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

Please try reverting get_json_object.cu in the latest commit and apply this:

Done, with a little change:

    } else {
      // valid JSON length is always greater than 0
      // if `str` size len is zero, output len is 0 and `is_valid` is false
      if (out_buf == nullptr) { d_sizes[tid] = 0; }
    }

@res-life res-life marked this pull request as ready for review April 10, 2024 05:44
@res-life
Copy link
Collaborator Author

@ttnghia thanks for your help, please help review again.

@ttnghia
Copy link
Collaborator

ttnghia commented Apr 11, 2024

Performance change after this PR, compared using #1952:

|  size_bytes  |  max_depth  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |           Diff |   %Diff |  Status  |
|--------------|-------------|------------|-------------|------------|-------------|----------------|---------|----------|
|   1000000    |      2      |   6.654 ms |      10.40% |   5.576 ms |      11.35% |   -1078.003 us | -16.20% |   FAIL   |
|   10000000   |      2      |   7.373 ms |       8.92% |   6.230 ms |      11.22% |   -1143.858 us | -15.51% |   FAIL   |
|  100000000   |      2      | 145.607 ms |       1.81% | 131.515 ms |       1.96% |  -14092.123 us |  -9.68% |   FAIL   |
|  1000000000  |      2      |    1.745 s |       0.67% |    1.558 s |       1.39% | -186955.541 us | -10.72% |   FAIL   |
|   1000000    |      4      |   7.300 ms |       8.87% |   5.836 ms |      12.70% |   -1463.903 us | -20.05% |   FAIL   |
|   10000000   |      4      |   8.247 ms |       7.03% |   6.414 ms |       8.63% |   -1833.158 us | -22.23% |   FAIL   |
|  100000000   |      4      | 159.210 ms |       0.98% | 133.165 ms |       1.58% |  -26044.341 us | -16.36% |   FAIL   |
|  1000000000  |      4      |    1.872 s |       0.92% |    1.518 s |       1.00% | -354327.340 us | -18.93% |   FAIL   |
|   1000000    |      6      |   7.639 ms |       9.03% |   5.905 ms |      10.32% |   -1734.084 us | -22.70% |   FAIL   |
|   10000000   |      6      |   8.646 ms |       8.02% |   6.346 ms |       7.87% |   -2300.449 us | -26.61% |   FAIL   |
|  100000000   |      6      | 165.838 ms |       1.92% | 131.152 ms |       1.16% |  -34686.127 us | -20.92% |   FAIL   |
|  1000000000  |      6      |    1.925 s |       1.07% |    1.581 s |       0.68% | -344136.035 us | -17.87% |   FAIL   |
|   1000000    |      8      |   8.109 ms |       6.55% |   6.090 ms |      10.00% |   -2019.178 us | -24.90% |   FAIL   |
|   10000000   |      8      |   9.414 ms |       5.90% |   6.598 ms |       8.01% |   -2815.621 us | -29.91% |   FAIL   |
|  100000000   |      8      | 177.391 ms |       1.20% | 134.754 ms |       1.20% |  -42636.822 us | -24.04% |   FAIL   |
|  1000000000  |      8      |    2.034 s |       0.54% |    1.576 s |       0.71% | -457780.972 us | -22.51% |   FAIL   |

We see that the PR improves run time in about 20%.

@res-life res-life self-assigned this Apr 11, 2024
@res-life res-life merged commit e5cae76 into NVIDIA:branch-24.04 Apr 11, 2024
3 checks passed
@res-life res-life deleted the fix-get-json-object-24.04 branch April 11, 2024 01:57
@res-life res-life changed the title getJsonObject: improve performance for new version of getJsonObject getJsonObject: improve performance for new version of getJsonObject; fix a bug Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] [follow-up] getJsonObject: improve performance for new version of getJsonObject
3 participants