Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40062: [C++][Python] Conversion of Table to Arrow Tensor #41870

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented May 29, 2024

Rationale for this change

There is currently no method to convert Arrow Table to Arrow Tensor (conversion from columnar format to a contiguous block of memory). This work is a continuation of RecordBatch::ToTensor work, see #40058.

What changes are included in this PR?

This PR:

  • implements Table::ToTensor conversion
  • adds bindings to Python
  • adds benchmarks in C++
  • removes the code in RecordBatch::ToTensor and uses the Table implementation (RecordBatch::ToTensor benchmarks checked)

Are these changes tested?

Yes, in C++ and Python.

Are there any user-facing changes?

No, it is a new feature.

@AlenkaF
Copy link
Member Author

AlenkaF commented May 29, 2024

Benchmarks for RecordBatch::ToTensor after the changeing the implementation to use Table::ToTensor:

(pyarrow-dev) alenkafrim@alenka-mac arrow % archery --quiet benchmark diff --benchmark-filter=BatchToTensorSimple
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (7)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                  benchmark      baseline     contender  change %                                                                                                                                                                                                  counters
  BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3 8.540 GiB/sec 8.826 GiB/sec     3.351   {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1545}
  BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3 4.515 GiB/sec 4.583 GiB/sec     1.516    {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 787}
 BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30 5.355 GiB/sec 5.426 GiB/sec     1.320   {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 971}
 BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30 2.113 GiB/sec 2.120 GiB/sec     0.331   {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 380}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300 2.009 GiB/sec 1.976 GiB/sec    -1.620  {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 363}
    BatchToTensorSimple<Int16Type>/size:65536/num_columns:3 5.391 GiB/sec 5.141 GiB/sec    -4.645    {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 61484}
BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300 7.797 GiB/sec 7.429 GiB/sec    -4.716 {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1374}

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (17)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                  benchmark        baseline       contender  change %                                                                                                                                                                                                 counters
 BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300 698.025 MiB/sec 642.690 MiB/sec    -7.927  {'family_index': 0, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 117}
  BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30 936.761 MiB/sec 849.504 MiB/sec    -9.315   {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 164}
 BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30   2.943 GiB/sec   2.664 GiB/sec    -9.484  {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 530}
   BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3   1.220 GiB/sec   1.103 GiB/sec    -9.540    {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 226}
BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300   3.350 GiB/sec   3.004 GiB/sec   -10.308 {'family_index': 2, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 603}
     BatchToTensorSimple<Int8Type>/size:65536/num_columns:3   1.343 GiB/sec   1.193 GiB/sec   -11.189    {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 15407}
  BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3   6.492 GiB/sec   5.679 GiB/sec   -12.518  {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1170}
    BatchToTensorSimple<Int32Type>/size:65536/num_columns:3   8.703 GiB/sec   7.530 GiB/sec   -13.478   {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 99016}
    BatchToTensorSimple<Int64Type>/size:65536/num_columns:3  17.419 GiB/sec  14.934 GiB/sec   -14.269  {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 198847}
    BatchToTensorSimple<Int8Type>/size:65536/num_columns:30   1.246 GiB/sec   1.013 GiB/sec   -18.692   {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14331}
   BatchToTensorSimple<Int16Type>/size:65536/num_columns:30   3.813 GiB/sec   3.045 GiB/sec   -20.148  {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 43240}
   BatchToTensorSimple<Int32Type>/size:65536/num_columns:30   5.497 GiB/sec   3.822 GiB/sec   -30.460  {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 63621}
   BatchToTensorSimple<Int8Type>/size:65536/num_columns:300 665.489 MiB/sec 452.284 MiB/sec   -32.037   {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7122}
   BatchToTensorSimple<Int64Type>/size:65536/num_columns:30   7.306 GiB/sec   4.883 GiB/sec   -33.166  {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 83661}
  BatchToTensorSimple<Int16Type>/size:65536/num_columns:300   1.024 GiB/sec 646.927 MiB/sec   -38.317 {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11642}
  BatchToTensorSimple<Int64Type>/size:65536/num_columns:300   1.208 GiB/sec 711.915 MiB/sec   -42.439 {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13994}
  BatchToTensorSimple<Int32Type>/size:65536/num_columns:300   1.158 GiB/sec 678.147 MiB/sec   -42.812 {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13406}

AlenkaF added a commit that referenced this pull request Jun 5, 2024
…to tensor.cc (#41932)

### Rationale for this change

This is a precursor PR to #41870 with the purpose to make the review of #41870 easier (the diff of the code will be visible as it currently isn't because the code was moved to table.cc. I should also live in tensor.cc).

### What changes are included in this PR?

The code from `RecordBatch::ToTensor` in record_batch.cc is moved to `RecordBatchToTensor` in tensor.cc.

### Are these changes tested?

Existing tests should pass.

### Are there any user-facing changes?

No.

**This PR does not close the linked issue yet, it is just a precursor!**
* GitHub Issue: #40062

Authored-by: AlenkaF <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
@AlenkaF AlenkaF force-pushed the gh-40062-table-to-tensor branch from 13c49a7 to 15574f8 Compare June 5, 2024 12:51
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request Jun 5, 2024
…ch.cc to tensor.cc (apache#41932)

### Rationale for this change

This is a precursor PR to apache#41870 with the purpose to make the review of apache#41870 easier (the diff of the code will be visible as it currently isn't because the code was moved to table.cc. I should also live in tensor.cc).

### What changes are included in this PR?

The code from `RecordBatch::ToTensor` in record_batch.cc is moved to `RecordBatchToTensor` in tensor.cc.

### Are these changes tested?

Existing tests should pass.

### Are there any user-facing changes?

No.

**This PR does not close the linked issue yet, it is just a precursor!**
* GitHub Issue: apache#40062

Authored-by: AlenkaF <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good! Some minor comments, and wondering if we can reduce the duplication in testing a bit

cpp/src/arrow/table.h Outdated Show resolved Hide resolved
cpp/src/arrow/tensor.cc Outdated Show resolved Hide resolved
cpp/src/arrow/tensor.cc Outdated Show resolved Hide resolved
python/pyarrow/table.pxi Outdated Show resolved Hide resolved
@@ -1219,6 +1219,295 @@ def test_recordbatch_to_tensor_unsupported():
batch.to_tensor()


@pytest.mark.parametrize('typ', [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding a lot of code to this test file, while in practice it's not really testing anything new. I am wondering if we could either parametrize some existing tests for RecordBatch to run with both RecordBatch and Table (we have some other tests in this file that are parametrized that way, see eg test_table_basics), or otherwise only test some unique aspects of Table.to_tensor (the fact that the method exists and works, a case with multiple chunks), but for the other aspects (type promotion, null handling, etc) rely on the record batch tests.

Copy link
Member Author

@AlenkaF AlenkaF Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added parametrization with RecordBatch and Table for mixed type test, others I left to rely on RecordBatch only. Then I kept one test for table with multiple chunks (test_table_to_tensor_uniform_type): feab6e1

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting review Awaiting review and removed awaiting review Awaiting review labels Jun 6, 2024
@AlenkaF AlenkaF force-pushed the gh-40062-table-to-tensor branch from 15574f8 to 4decf7f Compare June 10, 2024 15:56
@github-actions github-actions bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Jun 10, 2024
@AlenkaF
Copy link
Member Author

AlenkaF commented Jun 10, 2024

I have researched the benchmark regression a bit and found that:

  • running the benchmarks for RecordBatch::ToTensor shows up to 40% of change in time (regressions)
  • removing Table creation but keeping the code as is, hardcoding the for loop over the chunks to one iteration, makes the regression fall to maximum of 20%
benchmark diff output
(pyarrow-dev) alenkafrim@alenka-mac build % archery --quiet benchmark diff --benchmark-filter=ToTensorSimple
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (7)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                benchmark       baseline      contender  change %                                                                                                                                                                                                 counters
 BatchToTensorSimple<Int64Type>/size:65536/num_columns:30  7.321 GiB/sec  7.341 GiB/sec     0.275  {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84665}
  BatchToTensorSimple<Int64Type>/size:65536/num_columns:3 17.341 GiB/sec 17.385 GiB/sec     0.256  {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 197830}
BatchToTensorSimple<Int32Type>/size:65536/num_columns:300  1.153 GiB/sec  1.136 GiB/sec    -1.413 {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13151}
BatchToTensorSimple<Int64Type>/size:65536/num_columns:300  1.221 GiB/sec  1.198 GiB/sec    -1.838 {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13997}
BatchToTensorSimple<Int16Type>/size:65536/num_columns:300  1.027 GiB/sec  1.005 GiB/sec    -2.092 {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11502}
 BatchToTensorSimple<Int16Type>/size:65536/num_columns:30  3.824 GiB/sec  3.728 GiB/sec    -2.521  {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 43449}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3  4.435 GiB/sec  4.322 GiB/sec    -2.550   {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 792}

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (17)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                  benchmark        baseline        contender  change %                                                                                                                                                                                                  counters
 BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30   5.354 GiB/sec    5.078 GiB/sec    -5.159   {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 959}
    BatchToTensorSimple<Int32Type>/size:65536/num_columns:3   8.656 GiB/sec    8.107 GiB/sec    -6.348    {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 96401}
BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300   7.884 GiB/sec    7.371 GiB/sec    -6.506 {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1140}
 BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30   2.109 GiB/sec    1.969 GiB/sec    -6.655   {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 378}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300   2.007 GiB/sec    1.869 GiB/sec    -6.878  {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 360}
   BatchToTensorSimple<Int32Type>/size:65536/num_columns:30   5.514 GiB/sec    5.116 GiB/sec    -7.218   {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 62798}
BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300   3.346 GiB/sec    3.066 GiB/sec    -8.379  {'family_index': 2, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 601}
   BatchToTensorSimple<Int8Type>/size:65536/num_columns:300 669.230 MiB/sec  598.420 MiB/sec   -10.581    {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7493}
    BatchToTensorSimple<Int16Type>/size:65536/num_columns:3   5.393 GiB/sec    4.745 GiB/sec   -12.015    {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 61699}
 BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300 700.642 MiB/sec  611.987 MiB/sec   -12.653   {'family_index': 0, 'per_family_instance_index': 5, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 123}
    BatchToTensorSimple<Int8Type>/size:65536/num_columns:30   1.247 GiB/sec    1.075 GiB/sec   -13.836    {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14200}
  BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3   6.465 GiB/sec    5.567 GiB/sec   -13.879   {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1156}
  BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30 938.704 MiB/sec  792.766 MiB/sec   -15.547    {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 164}
 BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30   2.944 GiB/sec    2.453 GiB/sec   -16.660   {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 529}
  BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3   8.618 GiB/sec    7.157 GiB/sec   -16.959   {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1521}
   BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3   1.197 GiB/sec 1008.475 MiB/sec   -17.748     {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 227}
     BatchToTensorSimple<Int8Type>/size:65536/num_columns:3   1.314 GiB/sec    1.057 GiB/sec   -19.601     {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:3', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 15032}

Plan to also try profiling in python (py-spy doesn't work on MacOS, any other suggestions maybe?). Update: installed py-spy with brew and it works, looking at the .svg at the moment.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 11, 2024
cpp/src/arrow/table.h Outdated Show resolved Hide resolved
python/pyarrow/table.pxi Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 11, 2024
@jorisvandenbossche
Copy link
Member

I have researched the benchmark regression a bit and found that:

Do you see those regressions of up to 40% for both row major and column major conversions? And both for uniform vs mixed type with casting?

@AlenkaF
Copy link
Member Author

AlenkaF commented Jun 11, 2024

Do you see those regressions of up to 40% for both row major and column major conversions? And both for uniform vs mixed type with casting?

Benchmarks for RecordBatch only test row-major conversion. The newly added Table benchmarks test both. I think that was due to the fact we were adding features for RecordBatch::ToTensor step by step and we needed one simple benchmark that we could check while adding the features. Row-major conversion was the last to be added.

As for the types, we only test uniform types in C++ benchmarks at the moment.

ps: haven't been able to find extract any information with neither py-spy nor cProfile.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants