GH-40062: [C++][Python] Conversion of Table to Arrow Tensor #41870

AlenkaF · 2024-05-29T09:25:25Z

Rationale for this change

There is currently no method to convert Arrow Table to Arrow Tensor (conversion from columnar format to a contiguous block of memory). This work is a continuation of RecordBatch::ToTensor work, see #40058.

What changes are included in this PR?

This PR:

implements Table::ToTensor conversion
adds bindings to Python
adds benchmarks in C++
removes the code in RecordBatch::ToTensor and uses the Table implementation (RecordBatch::ToTensor benchmarks checked)

Are these changes tested?

Yes, in C++ and Python.

Are there any user-facing changes?

No, it is a new feature.

GitHub Issue: [C++][Python] Conversion of Table to Arrow Tensor #40062

AlenkaF · 2024-05-29T15:55:11Z

Benchmarks for RecordBatch::ToTensor after the changeing the implementation to use Table::ToTensor:

(pyarrow-dev) alenkafrim@alenka-mac arrow % archery --quiet benchmark diff --benchmark-filter=BatchToTensorSimple
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (7)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                  benchmark      baseline     contender  change %                                                                                                                                                                                                  counters
  BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3 8.540 GiB/sec 8.826 GiB/sec     3.351   {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1545}
  BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3 4.515 GiB/sec 4.583 GiB/sec     1.516    {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 787}
 BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30 5.355 GiB/sec 5.426 GiB/sec     1.320   {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 971}
 BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30 2.113 GiB/sec 2.120 GiB/sec     0.331   {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 380}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300 2.009 GiB/sec 1.976 GiB/sec    -1.620  {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 363}
    BatchToTensorSimple<Int16Type>/size:65536/num_columns:3 5.391 GiB/sec 5.141 GiB/sec    -4.645    {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 61484}
BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300 7.797 GiB/sec 7.429 GiB/sec    -4.716 {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1374}

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (17)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                  benchmark        baseline       contender  change %                                                                                                                                                                                                 counters
 BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300 698.025 MiB/sec 642.690 MiB/sec    -7.927  {'family_index': 0, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 117}
  BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30 936.761 MiB/sec 849.504 MiB/sec    -9.315   {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 164}
 BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30   2.943 GiB/sec   2.664 GiB/sec    -9.484  {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 530}
   BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3   1.220 GiB/sec   1.103 GiB/sec    -9.540    {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 226}
BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300   3.350 GiB/sec   3.004 GiB/sec   -10.308 {'family_index': 2, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 603}
     BatchToTensorSimple<Int8Type>/size:65536/num_columns:3   1.343 GiB/sec   1.193 GiB/sec   -11.189    {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 15407}
  BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3   6.492 GiB/sec   5.679 GiB/sec   -12.518  {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1170}
    BatchToTensorSimple<Int32Type>/size:65536/num_columns:3   8.703 GiB/sec   7.530 GiB/sec   -13.478   {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 99016}
    BatchToTensorSimple<Int64Type>/size:65536/num_columns:3  17.419 GiB/sec  14.934 GiB/sec   -14.269  {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 198847}
    BatchToTensorSimple<Int8Type>/size:65536/num_columns:30   1.246 GiB/sec   1.013 GiB/sec   -18.692   {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14331}
   BatchToTensorSimple<Int16Type>/size:65536/num_columns:30   3.813 GiB/sec   3.045 GiB/sec   -20.148  {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 43240}
   BatchToTensorSimple<Int32Type>/size:65536/num_columns:30   5.497 GiB/sec   3.822 GiB/sec   -30.460  {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 63621}
   BatchToTensorSimple<Int8Type>/size:65536/num_columns:300 665.489 MiB/sec 452.284 MiB/sec   -32.037   {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7122}
   BatchToTensorSimple<Int64Type>/size:65536/num_columns:30   7.306 GiB/sec   4.883 GiB/sec   -33.166  {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 83661}
  BatchToTensorSimple<Int16Type>/size:65536/num_columns:300   1.024 GiB/sec 646.927 MiB/sec   -38.317 {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11642}
  BatchToTensorSimple<Int64Type>/size:65536/num_columns:300   1.208 GiB/sec 711.915 MiB/sec   -42.439 {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13994}
  BatchToTensorSimple<Int32Type>/size:65536/num_columns:300   1.158 GiB/sec 678.147 MiB/sec   -42.812 {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13406}

…to tensor.cc (#41932) ### Rationale for this change This is a precursor PR to #41870 with the purpose to make the review of #41870 easier (the diff of the code will be visible as it currently isn't because the code was moved to table.cc. I should also live in tensor.cc). ### What changes are included in this PR? The code from `RecordBatch::ToTensor` in record_batch.cc is moved to `RecordBatchToTensor` in tensor.cc. ### Are these changes tested? Existing tests should pass. ### Are there any user-facing changes? No. **This PR does not close the linked issue yet, it is just a precursor!** * GitHub Issue: #40062 Authored-by: AlenkaF <[email protected]> Signed-off-by: AlenkaF <[email protected]>

…ch.cc to tensor.cc (apache#41932) ### Rationale for this change This is a precursor PR to apache#41870 with the purpose to make the review of apache#41870 easier (the diff of the code will be visible as it currently isn't because the code was moved to table.cc. I should also live in tensor.cc). ### What changes are included in this PR? The code from `RecordBatch::ToTensor` in record_batch.cc is moved to `RecordBatchToTensor` in tensor.cc. ### Are these changes tested? Existing tests should pass. ### Are there any user-facing changes? No. **This PR does not close the linked issue yet, it is just a precursor!** * GitHub Issue: apache#40062 Authored-by: AlenkaF <[email protected]> Signed-off-by: AlenkaF <[email protected]>

jorisvandenbossche

Generally looks good! Some minor comments, and wondering if we can reduce the duplication in testing a bit

cpp/src/arrow/table.h

cpp/src/arrow/tensor.cc

python/pyarrow/table.pxi

jorisvandenbossche · 2024-06-06T09:35:03Z

python/pyarrow/tests/test_table.py

@@ -1219,6 +1219,295 @@ def test_recordbatch_to_tensor_unsupported():
        batch.to_tensor()


+@pytest.mark.parametrize('typ', [


We are adding a lot of code to this test file, while in practice it's not really testing anything new. I am wondering if we could either parametrize some existing tests for RecordBatch to run with both RecordBatch and Table (we have some other tests in this file that are parametrized that way, see eg test_table_basics), or otherwise only test some unique aspects of Table.to_tensor (the fact that the method exists and works, a case with multiple chunks), but for the other aspects (type promotion, null handling, etc) rely on the record batch tests.

Added parametrization with RecordBatch and Table for mixed type test, others I left to rely on RecordBatch only. Then I kept one test for table with multiple chunks (test_table_to_tensor_uniform_type): feab6e1

…rk for Arrow Tables

AlenkaF · 2024-06-10T16:01:42Z

I have researched the benchmark regression a bit and found that:

running the benchmarks for RecordBatch::ToTensor shows up to 40% of change in time (regressions)
removing Table creation but keeping the code as is, hardcoding the for loop over the chunks to one iteration, makes the regression fall to maximum of 20%

benchmark diff output

(pyarrow-dev) alenkafrim@alenka-mac build % archery --quiet benchmark diff --benchmark-filter=ToTensorSimple
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (7)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                benchmark       baseline      contender  change %                                                                                                                                                                                                 counters
 BatchToTensorSimple<Int64Type>/size:65536/num_columns:30  7.321 GiB/sec  7.341 GiB/sec     0.275  {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84665}
  BatchToTensorSimple<Int64Type>/size:65536/num_columns:3 17.341 GiB/sec 17.385 GiB/sec     0.256  {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 197830}
BatchToTensorSimple<Int32Type>/size:65536/num_columns:300  1.153 GiB/sec  1.136 GiB/sec    -1.413 {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13151}
BatchToTensorSimple<Int64Type>/size:65536/num_columns:300  1.221 GiB/sec  1.198 GiB/sec    -1.838 {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13997}
BatchToTensorSimple<Int16Type>/size:65536/num_columns:300  1.027 GiB/sec  1.005 GiB/sec    -2.092 {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11502}
 BatchToTensorSimple<Int16Type>/size:65536/num_columns:30  3.824 GiB/sec  3.728 GiB/sec    -2.521  {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 43449}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3  4.435 GiB/sec  4.322 GiB/sec    -2.550   {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 792}

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (17)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                  benchmark        baseline        contender  change %                                                                                                                                                                                                  counters
 BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30   5.354 GiB/sec    5.078 GiB/sec    -5.159   {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 959}
    BatchToTensorSimple<Int32Type>/size:65536/num_columns:3   8.656 GiB/sec    8.107 GiB/sec    -6.348    {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 96401}
BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300   7.884 GiB/sec    7.371 GiB/sec    -6.506 {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1140}
 BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30   2.109 GiB/sec    1.969 GiB/sec    -6.655   {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 378}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300   2.007 GiB/sec    1.869 GiB/sec    -6.878  {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 360}
   BatchToTensorSimple<Int32Type>/size:65536/num_columns:30   5.514 GiB/sec    5.116 GiB/sec    -7.218   {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 62798}
BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300   3.346 GiB/sec    3.066 GiB/sec    -8.379  {'family_index': 2, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 601}
   BatchToTensorSimple<Int8Type>/size:65536/num_columns:300 669.230 MiB/sec  598.420 MiB/sec   -10.581    {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7493}
    BatchToTensorSimple<Int16Type>/size:65536/num_columns:3   5.393 GiB/sec    4.745 GiB/sec   -12.015    {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 61699}
 BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300 700.642 MiB/sec  611.987 MiB/sec   -12.653   {'family_index': 0, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 123}
    BatchToTensorSimple<Int8Type>/size:65536/num_columns:30   1.247 GiB/sec    1.075 GiB/sec   -13.836    {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14200}
  BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3   6.465 GiB/sec    5.567 GiB/sec   -13.879   {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1156}
  BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30 938.704 MiB/sec  792.766 MiB/sec   -15.547    {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 164}
 BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30   2.944 GiB/sec    2.453 GiB/sec   -16.660   {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 529}
  BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3   8.618 GiB/sec    7.157 GiB/sec   -16.959   {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1521}
   BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3   1.197 GiB/sec 1008.475 MiB/sec   -17.748     {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 227}
     BatchToTensorSimple<Int8Type>/size:65536/num_columns:3   1.314 GiB/sec    1.057 GiB/sec   -19.601     {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 15032}

Plan to also try profiling in python (py-spy doesn't work on MacOS, any other suggestions maybe?). Update: installed py-spy with brew and it works, looking at the .svg at the moment.

…st for both batch and table

cpp/src/arrow/table.h

python/pyarrow/table.pxi

jorisvandenbossche · 2024-06-11T07:06:54Z

I have researched the benchmark regression a bit and found that:

Do you see those regressions of up to 40% for both row major and column major conversions? And both for uniform vs mixed type with casting?

AlenkaF · 2024-06-11T07:15:14Z

Do you see those regressions of up to 40% for both row major and column major conversions? And both for uniform vs mixed type with casting?

Benchmarks for RecordBatch only test row-major conversion. The newly added Table benchmarks test both. I think that was due to the fact we were adding features for RecordBatch::ToTensor step by step and we needed one simple benchmark that we could check while adding the features. Row-major conversion was the last to be added.

As for the types, we only test uniform types in C++ benchmarks at the moment.

ps: haven't been able to find extract any information with neither py-spy nor cProfile.

github-actions bot added Component: C++ Component: Python awaiting review Awaiting review labels May 29, 2024

AlenkaF mentioned this pull request Jun 3, 2024

GH-40062: [C++] Move RecordBatch::ToTensor code from record_batch.cc to tensor.cc #41932

Merged

AlenkaF force-pushed the gh-40062-table-to-tensor branch from 13c49a7 to 15574f8 Compare June 5, 2024 12:51

jorisvandenbossche reviewed Jun 6, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes awaiting review Awaiting review and removed awaiting review Awaiting review labels Jun 6, 2024

AlenkaF added 12 commits June 10, 2024 16:36

Add Table::ToTensor and bindings to Python with Python tests

a64bcf6

Add C++ tests

2232af5

Add benchmarks

5092120

Fix linter error

2798606

Add cmath include

6bfadb2

Change helper function names in C++ tests, fix doctest errors

a5a3b18

Correct indentations

2a31aae

Remove code from RecordBatch::ToTensor and use Table implementation

a93a7f7

Add RecordBatchToTensor code to tensor.cc

1ab6f55

Change RecordBatchToTensor to TableToTensor and update the code to wo…

9394807

…rk for Arrow Tables

Use TableToTensor in Table::ToTensor

3760a3f

Fix docstrings and change index names

4decf7f

AlenkaF force-pushed the gh-40062-table-to-tensor branch from 15574f8 to 4decf7f Compare June 10, 2024 15:56

github-actions bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Jun 10, 2024

Remove most of table_to_tensor tests in python and parametrize one te…

feab6e1

…st for both batch and table

Use self.table and self.batch, run linter

9ba29f3

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 11, 2024

Redu unrelated linter changes

b215c3b

jorisvandenbossche reviewed Jun 11, 2024

View reviewed changes

cpp/src/arrow/table.h Outdated Show resolved Hide resolved

python/pyarrow/table.pxi Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 11, 2024

AlenkaF added 2 commits June 11, 2024 09:16

Remove shape and strides from ToTensor docstrings

641ab7e

Remove s in NaNs

32eedbc

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-40062: [C++][Python] Conversion of Table to Arrow Tensor #41870

GH-40062: [C++][Python] Conversion of Table to Arrow Tensor #41870

AlenkaF commented May 29, 2024 •

edited

Loading

AlenkaF commented May 29, 2024

jorisvandenbossche left a comment

jorisvandenbossche Jun 6, 2024

AlenkaF Jun 11, 2024 •

edited

Loading

AlenkaF commented Jun 10, 2024 •

edited

Loading

jorisvandenbossche commented Jun 11, 2024

AlenkaF commented Jun 11, 2024

		@@ -1219,6 +1219,295 @@ def test_recordbatch_to_tensor_unsupported():
		batch.to_tensor()


		@pytest.mark.parametrize('typ', [

GH-40062: [C++][Python] Conversion of Table to Arrow Tensor #41870

Are you sure you want to change the base?

GH-40062: [C++][Python] Conversion of Table to Arrow Tensor #41870

Conversation

AlenkaF commented May 29, 2024 • edited Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

AlenkaF commented May 29, 2024

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Jun 6, 2024

Choose a reason for hiding this comment

AlenkaF Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

AlenkaF commented Jun 10, 2024 • edited Loading

jorisvandenbossche commented Jun 11, 2024

AlenkaF commented Jun 11, 2024

AlenkaF commented May 29, 2024 •

edited

Loading

AlenkaF Jun 11, 2024 •

edited

Loading

AlenkaF commented Jun 10, 2024 •

edited

Loading