JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder #10025

sperlingxx · 2022-01-12T10:15:18Z

According to NVIDIA/spark-rapids#4393, current PR takes several measures to speed up the buffer growing during the build of HostColumnVector:

Introduce rowCapacity to cache the maximum number of rows/bytes
Introduce pura Java method byteSizeOfNullMask to get the size of the validity buffer
Reorganize the code structure to reduce the number of method calls

I have tested this PR with the spark-rapids tests locally.
BTW, shall we clean up the HostColumnVector.Builder and replace all the usages of Builder with ColumnBuilder?

Signed-off-by: sperlingxx <[email protected]>

codecov · 2022-01-12T12:13:11Z

Codecov Report

Merging #10025 (1f67210) into branch-22.04 (e24fa8f) will increase coverage by 0.04%.
The diff coverage is 0.00%.

@@               Coverage Diff                @@
##           branch-22.04   #10025      +/-   ##
================================================
+ Coverage         10.37%   10.42%   +0.04%     
================================================
  Files               119      119              
  Lines             20149    20607     +458     
================================================
+ Hits               2091     2148      +57     
- Misses            18058    18459     +401

Impacted Files	Coverage Δ
python/cudf/cudf/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/_fuzz_testing/io.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_fuzz_testing/orc.py	`0.00% <ø> (ø)`
python/cudf/cudf/_fuzz_testing/parquet.py	`0.00% <ø> (ø)`
python/cudf/cudf/_fuzz_testing/utils.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/types.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/_base_index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/column/column.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/decimal.py	`0.00% <0.00%> (ø)`
... and 83 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd968f3...1f67210. Read the comment docs.

revans2

Testing this for correctness is great, but this is a performance change and I really would like to see the results of some benchmarks too so we can see if it is getting faster or not. A lot of my comments are things that I think will make the new code faster, but I do not know for sure, and if you benchmark it and find that I am wrong or it makes no difference, then we should keep code that is more readable, than worrying about the performance.

java/src/main/java/ai/rapids/cudf/HostColumnVector.java

revans2 · 2022-01-12T18:26:51Z

java/src/main/java/ai/rapids/cudf/HostColumnVector.java

+     * The Java substitution of native method `ColumnView.getNativeValidPointerSize`.
+     * Ideally, this method can speed up growValidBuffer by eliminating the JNI call.
+     */
+    private static long byteSizeOfNullMask(int numRows) {


Might be nice to look at replacing the JNI call entirely with this. I don't see a lot of reason to have this hidden here when we could make it common.

java/src/main/java/ai/rapids/cudf/HostColumnVector.java

sperlingxx · 2022-01-14T10:19:27Z

Hi @revans2, I ran a local benchmark to compare the performance before and after this change. I ran the benchmark with maven-surefire-plugin. And I disabled the all assertions with tag enableAssertions. I ran each condition for 5 times to compute the average cost.

Here is the result:

	Original Implementation	Current Implementation
Long without Null	2266.2 ms	1292.6 ms
Long	4794.0 ms	1647.4 ms
String without Null	18498.6 ms	14863.4 ms
String	16647.8 ms	15922.6 ms
List[Int]	12566.4 ms	8268.0 ms
Struct[(Int, Long, String)]	37077.0 ms	23791.0 ms
List[Struct[(Int, Long, String)]}	50902.8 ms	32077.0 ms

Here is the code snippet of the benchmark:

  @Test
  public void testBenchmarkHostColumnBuilder() {
    int runs = 5;
    int estimatedRows = 1024;
    int totalRows = Integer.MAX_VALUE / 10;
    long duration;
    HostColumnVector.DataType type = new HostColumnVector.BasicType(true, DType.INT64);

    System.out.println("append Long without NULL:");
    duration = 0L;
    for (int x = 0; x < runs; x++) {
      long start = System.currentTimeMillis();
      try (HostColumnVector.ColumnBuilder cb = new HostColumnVector.ColumnBuilder(
          type, estimatedRows)) {
        for (int i = 0; i < totalRows; i++) {
          cb.append((long) i);
        }
      }
      duration += System.currentTimeMillis() - start;
    }
    System.out.println("average cost of " + runs + " runs: " + (float) duration / runs);

    System.out.println("append Long:");
    duration = 0L;
    for (int x = 0; x < runs; x++) {
      long start = System.currentTimeMillis();
      try (HostColumnVector.ColumnBuilder cb = new HostColumnVector.ColumnBuilder(
          type, estimatedRows)) {
        for (int i = 0; i < totalRows; i++) {
          if (i % 10 == 0) {
            cb.appendNull();
          } else {
            cb.append((long) i);
          }
        }
      }
      duration += System.currentTimeMillis() - start;
    }
    System.out.println("average cost of " + runs + " runs: " + (float) duration / runs);

    System.out.println("append String without NULL:");
    duration = 0L;
    type = new HostColumnVector.BasicType(true, DType.STRING);
    for (int x = 0; x < runs; x++) {
      long start = System.currentTimeMillis();
      try (HostColumnVector.ColumnBuilder cb = new HostColumnVector.ColumnBuilder(
          type, estimatedRows)) {
        for (int i = 0; i < totalRows; i++) {
          if (i % 10 == 0) {
            cb.appendNull();
          } else {
            cb.append(String.valueOf(i));
          }
        }
      }
      duration += System.currentTimeMillis() - start;
    }
    System.out.println("average cost of " + runs + " runs: " + (float) duration / runs);

    System.out.println("append String:");
    duration = 0L;
    type = new HostColumnVector.BasicType(true, DType.STRING);
    for (int x = 0; x < runs; x++) {
      long start = System.currentTimeMillis();
      try (HostColumnVector.ColumnBuilder cb = new HostColumnVector.ColumnBuilder(
          type, estimatedRows)) {
        for (int i = 0; i < totalRows; i++) {
          cb.append(String.valueOf(i));
        }
      }
      duration += System.currentTimeMillis() - start;
    }
    System.out.println("average cost of " + runs + " runs: " + (float) duration / runs);

    System.out.println("append List[Int]:");
    duration = 0L;
    type = new HostColumnVector.ListType(true,
        new HostColumnVector.BasicType(true, DType.INT32));
    for (int x = 0; x < runs; x++) {
      long start = System.currentTimeMillis();
      try (HostColumnVector.ColumnBuilder cb = new HostColumnVector.ColumnBuilder(
          type, estimatedRows)) {
        for (int i = 0; i < totalRows; i++) {
          if (i % 10 == 0) {
            cb.appendNull();
            continue;
          }
          cb.appendLists(Lists.newArrayList(i - 1, i));
        }
      }
      duration += System.currentTimeMillis() - start;
    }
    System.out.println("average cost of " + runs + " runs: " + (float) duration / runs);

    System.out.println("append Struct[(Int, Long, String)]:");
    duration = 0L;
    type = new HostColumnVector.StructType(true,
        new HostColumnVector.BasicType(true, DType.INT32),
        new HostColumnVector.BasicType(true, DType.INT64),
        new HostColumnVector.BasicType(true, DType.STRING));
    for (int x = 0; x < runs; x++) {
      long start = System.currentTimeMillis();
      try (HostColumnVector.ColumnBuilder cb = new HostColumnVector.ColumnBuilder(
          type, estimatedRows)) {
        for (int i = 0; i < totalRows; i++) {
          if (i % 10 == 0) {
            cb.appendNull();
            continue;
          }
          cb.appendStructValues(
              new HostColumnVector.StructData(i, (long) i, String.valueOf(i)));
        }
      }
      duration += System.currentTimeMillis() - start;
    }
    System.out.println("average cost of " + runs + " runs: " + (float) duration / runs);

    System.out.println("append List[Struct[(Int, Long, String)]]:");
    duration = 0L;
    type = new HostColumnVector.ListType(true,
        new HostColumnVector.StructType(true,
            new HostColumnVector.BasicType(true, DType.INT32),
            new HostColumnVector.BasicType(true, DType.INT64),
            new HostColumnVector.BasicType(true, DType.STRING)));
    for (int x = 0; x < runs; x++) {
      long start = System.currentTimeMillis();
      try (HostColumnVector.ColumnBuilder cb = new HostColumnVector.ColumnBuilder(
          type, estimatedRows)) {
        for (int i = 0; i < totalRows; i++) {
          if (i % 20 == 0) {
            cb.appendNull();
          } else if (i % 10 == 0) {
            cb.appendLists(Lists.newArrayList((HostColumnVector.StructData) null));
          } else {
            cb.appendLists(Lists.newArrayList(
                new HostColumnVector.StructData(i, (long) i, String.valueOf(i))));
          }
        }
      }
      duration += System.currentTimeMillis() - start;
    }
    System.out.println("average cost of " + runs + " runs: " + (float) duration / runs);
  }

revans2 · 2022-01-14T15:03:55Z

For fixed width columns we typically can estimate the total number of rows accurately. So having it start with 1024 rows and grow regularly is not expected.

For Strings we guess 100 bytes per String, so with your example we are likely to over estimate there too. For lists we assume one item per list, and in your example will likely match the number of rows.

So if we want to be more complete I would like to see one case where we are correct in providing the estimatedRows, and then use this one for the other case where we are likely wrong. Just so we can see the overhead of growing a buffer as separate from the rest of the computation as possible.

kuhushukla · 2022-01-14T15:08:57Z

This is looking pretty good to me.

revans2 · 2022-01-14T16:07:56Z

	Original Implementation	Current Implementation	Orig Speed (estimated)	Current Speed (estimated)
Long without Null	2266.2 ms	1292.6 ms	0.71 GiB/s	1.24 GiB/s
Long	4794.0 ms	1647.4 ms	0.33 GiB/s	0.97 GiB/s
String without Null	18498.6 ms	14863.4 ms	0.09 GiB/s	0.11 GiB/s
String	16647.8 ms	15922.6 ms	0.10 GiB/s	0.11 GiB/s
List[Int]	12566.4 ms	8268.0 ms	0.13 GiB/s	0.19 GiB/s
Struct[(Int, Long, String)]	37077.0 ms	23791.0 ms	0.11 GiB/s	0.17 GiB/s
List[Struct[(Int, Long, String)]}	50902.8 ms	32077.0 ms	0.08 GiB/s	0.13 GiB/s

The numbers are definitely better, but I am not thrilled by them. That said you must have a much better CPU than me because when I run these they are not nearly as good.

revans2

Overall I am happy with the change. I am a little disappointed that we couldn't make it go even faster, but this is a really good step forward. Just some nits and a bit of cleanup for the API.

revans2 · 2022-01-14T16:08:37Z

java/src/main/java/ai/rapids/cudf/ColumnView.java

+   * Get the number of bytes needed to allocate a validity buffer for the given number of rows.
+   * According to cudf::bitmask_allocation_size_bytes, the padding boundary for null mask is 64 bytes.
+   */
+  public static long getValidityBufferSize(int numRows) {


nit: not public. The old API was package private

java/src/main/java/ai/rapids/cudf/HostColumnVector.java

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2022-01-21T06:02:50Z

rerun tests

sperlingxx · 2022-01-25T02:04:39Z

I think this PR is ready. @revans2

revans2 · 2022-01-31T15:06:52Z

@gpucibot merge

sperlingxx added 2 commits January 12, 2022 13:14

draft

77119a9

rewrite the growBuffersAndRows of HostColumnVector.ColumnBuilder

5e57c4b

Signed-off-by: sperlingxx <[email protected]>

sperlingxx requested review from revans2 and kuhushukla January 12, 2022 10:15

sperlingxx requested a review from a team as a code owner January 12, 2022 10:15

github-actions bot added the Java Affects Java cuDF API. label Jan 12, 2022

sperlingxx added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 12, 2022

revans2 reviewed Jan 12, 2022

View reviewed changes

sperlingxx added 3 commits January 13, 2022 17:00

update

1777542

small fix

73a3a62

update

767b237

sperlingxx requested a review from revans2 January 14, 2022 10:26

revans2 reviewed Jan 14, 2022

View reviewed changes

sperlingxx added 3 commits January 17, 2022 16:25

update

c6e9d86

small fix

74f798c

add more tests for ColumnBuilder

5908685

Signed-off-by: sperlingxx <[email protected]>

sperlingxx mentioned this pull request Jan 19, 2022

[FEA] Appending host columnar data into ColumnBuilder by batch NVIDIA/spark-rapids#4565

Closed

sperlingxx changed the base branch from branch-22.02 to branch-22.04 January 21, 2022 02:05

Merge remote-tracking branch 'origin/branch-22.04' into opt_col_builder

41d8e98

sperlingxx added 2 commits January 24, 2022 18:54

Merge remote-tracking branch 'origin/branch-22.04' into opt_col_builder

aa0d151

merge master

1f67210

sperlingxx requested a review from revans2 January 25, 2022 02:04

revans2 approved these changes Jan 31, 2022

View reviewed changes

rapids-bot bot merged commit b217d7e into rapidsai:branch-22.04 Jan 31, 2022

sperlingxx deleted the opt_col_builder branch February 14, 2022 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder #10025

JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder #10025

sperlingxx commented Jan 12, 2022 •

edited

Loading

codecov bot commented Jan 12, 2022 •

edited

Loading

revans2 left a comment

revans2 Jan 12, 2022

sperlingxx Jan 14, 2022

sperlingxx commented Jan 14, 2022

revans2 commented Jan 14, 2022

kuhushukla commented Jan 14, 2022

revans2 commented Jan 14, 2022

revans2 left a comment

revans2 Jan 14, 2022

sperlingxx Jan 17, 2022

sperlingxx commented Jan 21, 2022

sperlingxx commented Jan 25, 2022 •

edited

Loading

revans2 commented Jan 31, 2022

JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder #10025

JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder #10025

Conversation

sperlingxx commented Jan 12, 2022 • edited Loading

codecov bot commented Jan 12, 2022 • edited Loading

Codecov Report

revans2 left a comment

Choose a reason for hiding this comment

revans2 Jan 12, 2022

Choose a reason for hiding this comment

sperlingxx Jan 14, 2022

Choose a reason for hiding this comment

sperlingxx commented Jan 14, 2022

revans2 commented Jan 14, 2022

kuhushukla commented Jan 14, 2022

revans2 commented Jan 14, 2022

revans2 left a comment

Choose a reason for hiding this comment

revans2 Jan 14, 2022

Choose a reason for hiding this comment

sperlingxx Jan 17, 2022

Choose a reason for hiding this comment

sperlingxx commented Jan 21, 2022

sperlingxx commented Jan 25, 2022 • edited Loading

revans2 commented Jan 31, 2022

sperlingxx commented Jan 12, 2022 •

edited

Loading

codecov bot commented Jan 12, 2022 •

edited

Loading

sperlingxx commented Jan 25, 2022 •

edited

Loading