Rewriting row/column conversions for Spark <-> cudf data conversions #8444

hyperbolic2346 · 2021-06-07T08:19:39Z

Row to column and column to row conversions changed to support large numbers of columns and variable-width data.

So far this is the column to row work and variable width work is not completed yet.

This code is currently copied over to the cudf side for benchmarking, but will not remain there.

hyperbolic2346 · 2021-06-07T19:31:14Z

Note that I copied a file from the java side to add my changes so I could benchmark it. This will not live here at the end.

This makes the entire file look new, but in reality the only new things are related to the kernel copy_from_columns and the function convert_to_rows2.

harrism · 2021-06-15T00:07:06Z

@hyperbolic2346 can you give this a more specific title?

harrism · 2021-06-15T00:07:38Z

Also please use draft PRs rather than "[WIP]" to reduce reviewer notification noise.

hyperbolic2346 · 2021-06-16T19:36:18Z

Updated to fix corner cases found. Current benchmarks are not great. The kernel performance according to nsight is actually on par with or better than the existing case until the sizes get large and performance falls off a cliff. Unsure yet what is going on there, but it could be a worst-case memory access pattern. Investigation ongoing. There is a pile of work done before the kernel launch to produce some fairly large data arrays with things like row sizes. The original row conversion didn't require this work since it didn't support variable-width data. Each row can be a different size now, so that information must be created and passed to the kernel. For very small tables this overpowers the real work. Potential optimization of checking for variable-width data and only calculating and send the row sizes if variable-width data actually exists.

Benchmark                                                        Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------
RowConversion/old_to_row_conversion/64/manual_time           0.037 ms        0.054 ms        19211 bytes_per_second=2.36861G/s
RowConversion/old_to_row_conversion/512/manual_time          0.038 ms        0.054 ms        13712 bytes_per_second=18.3985G/s
RowConversion/old_to_row_conversion/4096/manual_time         0.053 ms        0.067 ms        13043 bytes_per_second=105.713G/s
RowConversion/old_to_row_conversion/32768/manual_time        0.194 ms        0.211 ms         3552 bytes_per_second=230.037G/s
RowConversion/old_to_row_conversion/262144/manual_time        1.33 ms         1.35 ms          514 bytes_per_second=268.457G/s
RowConversion/old_to_row_conversion/1048576/manual_time       5.33 ms         5.35 ms          124 bytes_per_second=268.378G/s
RowConversion/new_to_row_conversion/64/manual_time            1.89 ms         1.91 ms          361 bytes_per_second=47.1822M/s
RowConversion/new_to_row_conversion/512/manual_time          0.558 ms        0.575 ms         1145 bytes_per_second=1.25006G/s
RowConversion/new_to_row_conversion/4096/manual_time         0.305 ms        0.320 ms         2250 bytes_per_second=18.3105G/s
RowConversion/new_to_row_conversion/32768/manual_time         1.52 ms         1.53 ms          455 bytes_per_second=29.4834G/s
RowConversion/new_to_row_conversion/262144/manual_time        38.8 ms         38.8 ms           18 bytes_per_second=9.21313G/s
RowConversion/new_to_row_conversion/1048576/manual_time        156 ms          156 ms            4 bytes_per_second=9.14925G/s

More work needed for validity. An interesting idea came up from Bobby pointing out that the validity data is just another table to copy. The window sizes may need to be limited to line up the validity bits with byte boundaries, but this should be pursued for sure. Speaking of window sizes, the window size is currently arbitrarily sized to 1024 rows and then as many columns as will fit into shared memory. Thinking about this more, I believe it would be best to have a "square" window. I put that in quotes because it isn't the same number of rows and columns, but instead the same number of bytes of each direction. This is another potential optimization on the horizon.

hyperbolic2346 · 2021-06-21T18:21:10Z

Squared up the incoming windows for the GPU kernel operations. This made large improvements in throughput as there was more data to write out per row. Found out that there is an issue with shared memory writes striding in such a way that they produce bank conflicts. Need to think about how to get around that while still maintaining 8-byte writes out of shared memory.

Benchmark                                                        Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------
RowConversion/old_to_row_conversion/64/manual_time           0.037 ms        0.054 ms        18742 bytes_per_second=2.35329G/s
RowConversion/old_to_row_conversion/512/manual_time          0.039 ms        0.055 ms        16916 bytes_per_second=17.9473G/s
RowConversion/old_to_row_conversion/4096/manual_time         0.053 ms        0.067 ms        10089 bytes_per_second=105.421G/s
RowConversion/old_to_row_conversion/32768/manual_time        0.195 ms        0.213 ms         3462 bytes_per_second=228.885G/s
RowConversion/old_to_row_conversion/262144/manual_time        1.33 ms         1.34 ms          521 bytes_per_second=269.189G/s
RowConversion/old_to_row_conversion/1048576/manual_time       5.33 ms         5.35 ms          125 bytes_per_second=268.217G/s
RowConversion/new_to_row_conversion/64/manual_time            1.87 ms         1.89 ms          371 bytes_per_second=47.663M/s
RowConversion/new_to_row_conversion/512/manual_time          0.298 ms        0.315 ms         2284 bytes_per_second=2.33964G/s
RowConversion/new_to_row_conversion/4096/manual_time         0.177 ms        0.192 ms         3865 bytes_per_second=31.4794G/s
RowConversion/new_to_row_conversion/32768/manual_time        0.973 ms        0.991 ms          654 bytes_per_second=45.9143G/s
RowConversion/new_to_row_conversion/262144/manual_time        7.08 ms         7.10 ms           87 bytes_per_second=50.4549G/s
RowConversion/new_to_row_conversion/1048576/manual_time       28.4 ms         28.4 ms           25 bytes_per_second=50.3041G/s

cpp/src/row_conversion/row_conversion.cu

jrhemstad · 2021-07-01T18:49:14Z

cpp/src/row_conversion/row_conversion.cu

+  // Because shared memory is limited we copy a subset of the rows at a time.
+  // For simplicity we will refer to this as a row_group


This would be a perfect algorithm for memcpy_async and setting up a 2 stage pipeline of copying in data to shared memory asynchronously while writing out the other stage. See https://developer.nvidia.com/blog/controlling-data-movement-to-boost-performance-on-ampere-architecture/

jrhemstad · 2021-07-01T18:50:59Z

cpp/src/row_conversion/row_conversion.cu

+  // In practice we have found writing more than 4 columns of data per thread
+  // results in performance loss. As such we are using a 2 dimensional
+  // kernel in terms of threads, but not in terms of blocks. Columns are
+  // controlled by the y dimension (there is no y dimension in blocks). Rows
+  // are controlled by the x dimension (there are multiple blocks in the x
+  // dimension).


You can just say you're using a 1D grid of 2D blocks.

cpp/src/row_conversion/row_conversion.cu

hyperbolic2346 · 2021-07-08T01:52:49Z

--------------------------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
RowConversion/old_to_row_conversion/64/manual_time             0.038 ms        0.055 ms        18801 bytes_per_second=2.3232G/s
RowConversion/old_to_row_conversion/512/manual_time            0.038 ms        0.054 ms        18703 bytes_per_second=18.4552G/s
RowConversion/old_to_row_conversion/4096/manual_time           0.052 ms        0.065 ms        10013 bytes_per_second=107.946G/s
RowConversion/old_to_row_conversion/32768/manual_time          0.195 ms        0.212 ms         3514 bytes_per_second=228.956G/s
RowConversion/old_to_row_conversion/262144/manual_time          1.30 ms         1.32 ms          522 bytes_per_second=274.572G/s
RowConversion/old_to_row_conversion/1048576/manual_time         5.31 ms         5.32 ms          125 bytes_per_second=269.388G/s
RowConversion/new_to_row_conversion/64/manual_time             0.128 ms        0.145 ms         5403 bytes_per_second=698.176M/s
RowConversion/new_to_row_conversion/512/manual_time            0.066 ms        0.083 ms        10124 bytes_per_second=10.5311G/s
RowConversion/new_to_row_conversion/4096/manual_time           0.128 ms        0.143 ms         5444 bytes_per_second=43.5364G/s
RowConversion/new_to_row_conversion/32768/manual_time          0.883 ms        0.900 ms          692 bytes_per_second=50.6077G/s
RowConversion/new_to_row_conversion/262144/manual_time          6.74 ms         6.76 ms          100 bytes_per_second=53.0137G/s
RowConversion/new_to_row_conversion/1048576/manual_time         26.8 ms         26.9 ms           25 bytes_per_second=53.2619G/s
RowConversion/old_from_row_conversion/64/manual_time           0.179 ms        0.196 ms         3647 bytes_per_second=601.202M/s
RowConversion/old_from_row_conversion/512/manual_time          0.181 ms        0.196 ms         3936 bytes_per_second=4.65231G/s
RowConversion/old_from_row_conversion/4096/manual_time         0.183 ms        0.196 ms         3880 bytes_per_second=36.8696G/s
RowConversion/old_from_row_conversion/32768/manual_time        0.201 ms        0.217 ms         3472 bytes_per_second=267.588G/s
RowConversion/new_from_row_conversion/64/manual_time           0.179 ms        0.195 ms         3876 bytes_per_second=603.682M/s
RowConversion/new_from_row_conversion/512/manual_time          0.183 ms        0.198 ms         3270 bytes_per_second=4.60054G/s
RowConversion/new_from_row_conversion/4096/manual_time          4.74 ms         4.75 ms          148 bytes_per_second=1.42085G/s
RowConversion/new_from_row_conversion/32768/manual_time          304 ms          304 ms            2 bytes_per_second=181.646M/s

hyperbolic2346 · 2021-07-08T21:04:23Z

Performance issues with to_column are almost entirely in validity calculations. The current method grabs a bit from each row and builds up a 32-bit column of validity data, but there is a lot of duplicated fetching of row offset and validity data since we're only using a single bit out of each one. Cache alone isn't enough to save us here. Will change this to read a byte of validity and work on 8 columns per thread to coalesce the reads better.

hyperbolic2346 · 2021-07-13T07:20:06Z

Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
RowConversion/old_to_row_conversion/64/manual_time             0.037 ms        0.054 ms        19092 bytes_per_second=2.37984G/s
RowConversion/old_to_row_conversion/512/manual_time            0.040 ms        0.056 ms        13312 bytes_per_second=17.3898G/s
RowConversion/old_to_row_conversion/4096/manual_time           0.052 ms        0.066 ms        10309 bytes_per_second=106.437G/s
RowConversion/old_to_row_conversion/32768/manual_time          0.195 ms        0.213 ms         3492 bytes_per_second=228.534G/s
RowConversion/old_to_row_conversion/262144/manual_time          1.30 ms         1.32 ms          527 bytes_per_second=273.955G/s
RowConversion/old_to_row_conversion/1048576/manual_time         5.31 ms         5.32 ms          127 bytes_per_second=269.39G/s
RowConversion/new_to_row_conversion/64/manual_time             0.130 ms        0.147 ms         5316 bytes_per_second=689.682M/s
RowConversion/new_to_row_conversion/512/manual_time            0.068 ms        0.084 ms         9935 bytes_per_second=10.2622G/s
RowConversion/new_to_row_conversion/4096/manual_time           0.136 ms        0.150 ms         4503 bytes_per_second=41.1081G/s
RowConversion/new_to_row_conversion/32768/manual_time          0.948 ms        0.966 ms          672 bytes_per_second=47.1212G/s
RowConversion/new_to_row_conversion/262144/manual_time          7.35 ms         7.37 ms           82 bytes_per_second=48.6385G/s
RowConversion/new_to_row_conversion/1048576/manual_time         28.7 ms         28.8 ms           25 bytes_per_second=49.7624G/s
RowConversion/old_from_row_conversion/64/manual_time           0.190 ms        0.207 ms         3828 bytes_per_second=568.013M/s
RowConversion/old_from_row_conversion/512/manual_time          0.184 ms        0.199 ms         3787 bytes_per_second=4.58102G/s
RowConversion/old_from_row_conversion/4096/manual_time         0.188 ms        0.201 ms         3421 bytes_per_second=35.8498G/s
RowConversion/old_from_row_conversion/32768/manual_time        0.216 ms        0.232 ms         3054 bytes_per_second=249.749G/s
RowConversion/old_from_row_conversion/262144/manual_time        1.33 ms         1.35 ms          522 bytes_per_second=323.235G/s
RowConversion/old_from_row_conversion/1048576/manual_time       5.08 ms         5.14 ms          137 bytes_per_second=339.467G/s
RowConversion/new_from_row_conversion/64/manual_time           0.187 ms        0.204 ms         3130 bytes_per_second=576.807M/s
RowConversion/new_from_row_conversion/512/manual_time          0.198 ms        0.213 ms         3649 bytes_per_second=4.25398G/s
RowConversion/new_from_row_conversion/4096/manual_time         0.201 ms        0.214 ms         3561 bytes_per_second=33.4906G/s
RowConversion/new_from_row_conversion/32768/manual_time        0.332 ms        0.346 ms         2126 bytes_per_second=162.438G/s
RowConversion/new_from_row_conversion/262144/manual_time        2.38 ms         2.40 ms          295 bytes_per_second=181.301G/s
RowConversion/new_from_row_conversion/1048576/manual_time       4.18 ms         4.23 ms          166 bytes_per_second=412.599G/s

Performance has improved on the row to column front. The new code is faster once the table size gets over 1 million rows in the benchmark data set. Working on cuda::memcpy_async now to hopefully see more gains.

harrism · 2021-07-21T23:30:32Z

Moving to 21.10

hyperbolic2346 · 2021-12-23T02:03:42Z

Sorry for the large diff here, but lots of things shuffled and were renamed when I removed the block nomenclature, which was a great idea.

hyperbolic2346 · 2021-12-23T02:14:02Z

Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
RowConversion/old_to_row_conversion/64/manual_time             0.041 ms        0.058 ms        17483 bytes_per_second=2.14732G/s
RowConversion/old_to_row_conversion/512/manual_time            0.040 ms        0.055 ms        17466 bytes_per_second=17.6611G/s
RowConversion/old_to_row_conversion/4096/manual_time           0.053 ms        0.067 ms         9545 bytes_per_second=104.782G/s
RowConversion/old_to_row_conversion/32768/manual_time          0.195 ms        0.212 ms         3543 bytes_per_second=229.09G/s
RowConversion/old_to_row_conversion/262144/manual_time          1.33 ms         1.34 ms          521 bytes_per_second=269.09G/s
RowConversion/old_to_row_conversion/1048576/manual_time         5.28 ms         5.30 ms          127 bytes_per_second=270.805G/s
RowConversion/new_to_row_conversion/64/manual_time             0.263 ms        0.280 ms         2714 bytes_per_second=340.026M/s
RowConversion/new_to_row_conversion/512/manual_time            0.583 ms        0.598 ms         1093 bytes_per_second=1.1968G/s
RowConversion/new_to_row_conversion/4096/manual_time           0.932 ms        0.946 ms          711 bytes_per_second=5.99446G/s
RowConversion/new_to_row_conversion/32768/manual_time           1.34 ms         1.36 ms          517 bytes_per_second=33.3962G/s
RowConversion/new_to_row_conversion/262144/manual_time          4.18 ms         4.20 ms          166 bytes_per_second=85.4348G/s
RowConversion/new_to_row_conversion/1048576/manual_time         15.2 ms         15.2 ms           45 bytes_per_second=94.2443G/s
RowConversion/old_from_row_conversion/64/manual_time           0.195 ms        0.212 ms         3824 bytes_per_second=551.947M/s
RowConversion/old_from_row_conversion/512/manual_time          0.182 ms        0.198 ms         3811 bytes_per_second=4.62471G/s
RowConversion/old_from_row_conversion/4096/manual_time         0.185 ms        0.198 ms         3569 bytes_per_second=36.4532G/s
RowConversion/old_from_row_conversion/32768/manual_time        0.203 ms        0.218 ms         3432 bytes_per_second=266.06G/s
RowConversion/old_from_row_conversion/262144/manual_time        1.24 ms         1.25 ms          550 bytes_per_second=348.962G/s
RowConversion/old_from_row_conversion/1048576/manual_time       4.79 ms         4.84 ms          146 bytes_per_second=360.062G/s
RowConversion/new_from_row_conversion/64/manual_time           0.382 ms        0.398 ms         1818 bytes_per_second=282.453M/s
RowConversion/new_from_row_conversion/512/manual_time          0.662 ms        0.677 ms         1030 bytes_per_second=1.27215G/s
RowConversion/new_from_row_conversion/4096/manual_time          1.10 ms         1.12 ms          627 bytes_per_second=6.11477G/s
RowConversion/new_from_row_conversion/32768/manual_time         2.20 ms         2.21 ms          318 bytes_per_second=24.5208G/s
RowConversion/new_from_row_conversion/262144/manual_time        9.33 ms         9.36 ms           75 bytes_per_second=46.2188G/s
RowConversion/new_from_row_conversion/1048576/manual_time       37.3 ms         37.7 ms           19 bytes_per_second=46.1899G/s

cpp/include/cudf/detail/utilities/integer_utils.hpp

java/src/main/native/src/row_conversion.cu

mythrocks · 2022-01-04T22:51:02Z

java/src/main/native/src/row_conversion.cu

+  column_sizes.reserve(num_columns);
+  column_starts.reserve(num_columns + 1); // we add a final offset for validity data start
+
+  auto schema_column_iter =


This could alternatively have been a zip_iterator, avoiding a counting iterator.

I like this idea, but with the need to reach into the table column with the call to tbl.column(i).type. Is there a way to make this work that I don't understand yet? The only way I can think of is to make a counting transform iterator for the type, which somewhat defeats the goal of less iterators.

java/src/main/native/src/row_conversion.cu

Co-authored-by: MithunR <[email protected]>

mythrocks

I'm 👍. Thank you for taking the time to go over this with me, @hyperbolic2346.

There is a minor sticking point regarding the constant_iterator comment. We might explore this shortly, but we needn't hold up the PR over it.

…df into mwilson/row_conversion

nvdbaranec

Just a couple of tiny things.

java/src/main/native/src/row_conversion.cu

hyperbolic2346 · 2022-01-10T04:29:01Z

FInal performance result of initial PR:

Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
RowConversion/old_to_row_conversion/64/manual_time             0.038 ms        0.055 ms        18268 bytes_per_second=2.29941G/s
RowConversion/old_to_row_conversion/512/manual_time            0.039 ms        0.055 ms        16679 bytes_per_second=17.7435G/s
RowConversion/old_to_row_conversion/4096/manual_time           0.054 ms        0.067 ms        11740 bytes_per_second=104.368G/s
RowConversion/old_to_row_conversion/32768/manual_time          0.195 ms        0.212 ms         3531 bytes_per_second=229.531G/s
RowConversion/old_to_row_conversion/262144/manual_time          1.32 ms         1.34 ms          517 bytes_per_second=270.185G/s
RowConversion/old_to_row_conversion/1048576/manual_time         5.28 ms         5.30 ms          125 bytes_per_second=270.541G/s
RowConversion/new_to_row_conversion/64/manual_time             0.254 ms        0.270 ms         2599 bytes_per_second=352.454M/s
RowConversion/new_to_row_conversion/512/manual_time            0.583 ms        0.598 ms         1094 bytes_per_second=1.19763G/s
RowConversion/new_to_row_conversion/4096/manual_time           0.932 ms        0.946 ms          719 bytes_per_second=5.99434G/s
RowConversion/new_to_row_conversion/32768/manual_time           1.32 ms         1.34 ms          516 bytes_per_second=33.8862G/s
RowConversion/new_to_row_conversion/262144/manual_time          4.20 ms         4.22 ms          168 bytes_per_second=85.0243G/s
RowConversion/new_to_row_conversion/1048576/manual_time         15.2 ms         15.2 ms           46 bytes_per_second=94.0081G/s
RowConversion/old_from_row_conversion/64/manual_time           0.174 ms        0.191 ms         3632 bytes_per_second=617.714M/s
RowConversion/old_from_row_conversion/512/manual_time          0.176 ms        0.192 ms         3920 bytes_per_second=4.78463G/s
RowConversion/old_from_row_conversion/4096/manual_time         0.182 ms        0.196 ms         3870 bytes_per_second=36.9156G/s
RowConversion/old_from_row_conversion/32768/manual_time        0.202 ms        0.218 ms         3440 bytes_per_second=266.348G/s
RowConversion/old_from_row_conversion/262144/manual_time        1.24 ms         1.26 ms          551 bytes_per_second=348.74G/s
RowConversion/old_from_row_conversion/1048576/manual_time       4.78 ms         4.84 ms          146 bytes_per_second=360.752G/s
RowConversion/new_from_row_conversion/64/manual_time           0.381 ms        0.398 ms         1862 bytes_per_second=282.912M/s
RowConversion/new_from_row_conversion/512/manual_time          0.639 ms        0.654 ms         1023 bytes_per_second=1.31774G/s
RowConversion/new_from_row_conversion/4096/manual_time          1.09 ms         1.11 ms          551 bytes_per_second=6.15463G/s
RowConversion/new_from_row_conversion/32768/manual_time         2.20 ms         2.21 ms          318 bytes_per_second=24.5236G/s
RowConversion/new_from_row_conversion/262144/manual_time        9.32 ms         9.31 ms           75 bytes_per_second=46.2711G/s
RowConversion/new_from_row_conversion/1048576/manual_time       37.1 ms         37.4 ms           19 bytes_per_second=46.5101G/s

hyperbolic2346 · 2022-01-10T15:35:13Z

@gpucibot merge

rapidsai#8444 modified JCUDF transcoding logic (in Java/JNI) to use cudaMemcpyAsync() and cuda::barrier to allow for asynchronous memcpy on GPUs that support it. While this works for __CUDA_ARCH__ >= 700, for older GPUs (e.g. Pascal), JCUDF conversions cause CUDA errors and failures. E.g. ``` ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction: invalid device function ``` For older GPUs, rather than fail spectacularly, it would be good to provide a more stable (if less efficient) fallback implementation, via `memcpy()`. This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`, depending on the GPU in play.

#8444 modified JCUDF transcoding logic (in Java/JNI) to use `cudaMemcpyAsync()` and `cuda::barrier` to allow for asynchronous memcpy on GPUs that support it. While this works for `__CUDA_ARCH__ >= 700`, for older GPUs (e.g. Pascal), JCUDF conversions cause CUDA errors and failures. E.g. ``` ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction: invalid device function ``` `cudaMemcpyAsync()` is not supported on Pascal GPUs or prior. (They lack the hardware, apparently.) For older GPUs, rather than fail spectacularly, it would be good to provide a more stable (if less efficient) fallback implementation, via `memcpy()`. This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`, depending on the GPU in play. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Nghia Truong (https://github.com/ttnghia) URL: #10329

hyperbolic2346 added 2 - In Progress Currently a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details labels Jun 7, 2021

hyperbolic2346 requested review from revans2 and nvdbaranec June 7, 2021 08:19

hyperbolic2346 requested review from a team as code owners June 7, 2021 08:19

hyperbolic2346 requested a review from mythrocks June 7, 2021 08:19

github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Jun 7, 2021

hyperbolic2346 marked this pull request as draft June 15, 2021 18:26

hyperbolic2346 changed the title ~~[WIP] Row/column conversion changes~~ Rewriting row/column conversions for Spark <-> cudf data conversions Jun 15, 2021

jrhemstad reviewed Jul 1, 2021

View reviewed changes

cpp/src/row_conversion/row_conversion.cu Outdated Show resolved Hide resolved

jrhemstad reviewed Jul 1, 2021

View reviewed changes

cpp/src/row_conversion/row_conversion.cu Outdated Show resolved Hide resolved

harrism changed the base branch from branch-21.08 to branch-21.10 July 21, 2021 23:30

github-actions bot added conda Java Affects Java cuDF API. labels Aug 26, 2021

removing commented out code

7fbe10d

ttnghia reviewed Dec 31, 2021

View reviewed changes

cpp/include/cudf/detail/utilities/integer_utils.hpp Show resolved Hide resolved

updating from review comments

d47360d

mythrocks reviewed Jan 4, 2022

View reviewed changes

java/src/main/native/src/row_conversion.cu Show resolved Hide resolved

mythrocks reviewed Jan 4, 2022

View reviewed changes

java/src/main/native/src/row_conversion.cu Outdated Show resolved Hide resolved

hyperbolic2346 added 2 commits January 4, 2022 18:45

Updating namespace

9b50271

updating namespace

fb7566c

mythrocks reviewed Jan 4, 2022

View reviewed changes

java/src/main/native/src/row_conversion.cu Show resolved Hide resolved

java/src/main/native/src/row_conversion.cu Outdated Show resolved Hide resolved

mythrocks requested changes Jan 5, 2022

View reviewed changes

Update java/src/main/native/src/row_conversion.cu

5e1cf97

Co-authored-by: MithunR <[email protected]>

mythrocks approved these changes Jan 6, 2022

View reviewed changes

hyperbolic2346 added 2 commits January 7, 2022 05:14

moving to a constant iterator and other review cleanup

a1e3545

Merge branch 'mwilson/row_conversion' of github.com:hyperbolic2346/cu…

8ef3bbe

…df into mwilson/row_conversion

nvdbaranec approved these changes Jan 7, 2022

View reviewed changes

java/src/main/native/src/row_conversion.cu Outdated Show resolved Hide resolved

java/src/main/native/src/row_conversion.cu Outdated Show resolved Hide resolved

hyperbolic2346 added 2 commits January 10, 2022 04:24

Removing magic numbers per review comments

4c750a9

removing magic number 2

0d0015a

rapids-bot bot merged commit dd390a2 into rapidsai:branch-22.02 Jan 10, 2022

This was referenced Jan 13, 2022

[FEA] Performance Updates for row/column conversions #10032

Closed

[FEA] explore faster data transitions NVIDIA/spark-rapids#507

Open

hyperbolic2346 mentioned this pull request Feb 9, 2022

[BUG] Plugin will crash with query > 100 columns on pascal GPU NVIDIA/spark-rapids#4739

Closed

mythrocks mentioned this pull request Feb 18, 2022

Add Pascal support to JCUDF transcode (row_conversion) #10329

Merged

hyperbolic2346 deleted the mwilson/row_conversion branch May 17, 2022 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewriting row/column conversions for Spark <-> cudf data conversions #8444

Rewriting row/column conversions for Spark <-> cudf data conversions #8444

hyperbolic2346 commented Jun 7, 2021

hyperbolic2346 commented Jun 7, 2021

harrism commented Jun 15, 2021

harrism commented Jun 15, 2021

hyperbolic2346 commented Jun 16, 2021

hyperbolic2346 commented Jun 21, 2021

jrhemstad Jul 1, 2021

jrhemstad Jul 1, 2021

hyperbolic2346 commented Jul 8, 2021

hyperbolic2346 commented Jul 8, 2021

hyperbolic2346 commented Jul 13, 2021

harrism commented Jul 21, 2021

hyperbolic2346 commented Dec 23, 2021

hyperbolic2346 commented Dec 23, 2021

mythrocks Jan 4, 2022

hyperbolic2346 Jan 6, 2022

mythrocks left a comment

nvdbaranec left a comment

hyperbolic2346 commented Jan 10, 2022

hyperbolic2346 commented Jan 10, 2022

		// Because shared memory is limited we copy a subset of the rows at a time.
		// For simplicity we will refer to this as a row_group

Rewriting row/column conversions for Spark <-> cudf data conversions #8444

Rewriting row/column conversions for Spark <-> cudf data conversions #8444

Conversation

hyperbolic2346 commented Jun 7, 2021

hyperbolic2346 commented Jun 7, 2021

harrism commented Jun 15, 2021

harrism commented Jun 15, 2021

hyperbolic2346 commented Jun 16, 2021

hyperbolic2346 commented Jun 21, 2021

jrhemstad Jul 1, 2021

Choose a reason for hiding this comment

jrhemstad Jul 1, 2021

Choose a reason for hiding this comment

hyperbolic2346 commented Jul 8, 2021

hyperbolic2346 commented Jul 8, 2021

hyperbolic2346 commented Jul 13, 2021

harrism commented Jul 21, 2021

hyperbolic2346 commented Dec 23, 2021

hyperbolic2346 commented Dec 23, 2021

mythrocks Jan 4, 2022

Choose a reason for hiding this comment

hyperbolic2346 Jan 6, 2022

Choose a reason for hiding this comment

mythrocks left a comment

Choose a reason for hiding this comment

nvdbaranec left a comment

Choose a reason for hiding this comment

hyperbolic2346 commented Jan 10, 2022

hyperbolic2346 commented Jan 10, 2022