Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add decimal support to integration tester #361

Merged
merged 19 commits into from
Jan 25, 2024

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Jan 12, 2024

This PR adds decimal support to the integration test utility. Because decimal buffers are implemented in the integration test JSON format as strings containing the integer representation of the decimal, it meant that nanoarrow needed an implementation of arbitrarily large integer to/from string. I modified this from Arrow C++ (links in comments next to the implementation) with a few differences to avoid porting the complete int128 implementation and the C++ standard library.

  • Parse strings containing arbitrarily large integers into decimal words
  • Convert decimal words into arbitrarily large integer strings
  • Wire the converters into the integration tester

The gaps in test coverage are from big-endian parts, which I are tested as part of weekly verification and that I tested locally with:

export NANOARROW_ARCH=s390x
docker compose run --rm verify

With archery integration --with-cpp=true --with-nanoarrow=true --run-c-data, the decimal tests now pass:

##########################################################
C Data Interface: C++ exporting, C++ importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because producer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: C++ exporting, nanoarrow importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 454, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 454, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 501, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 501, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because producer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: nanoarrow exporting, C++ importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because consumer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because consumer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because consumer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: nanoarrow exporting, nanoarrow importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
... with record batch #0
... with record batch #1
======================================================================


################# FAILURES #################
FAILED TEST: run_end_encoded C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

12 failures, 9 skips

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2024

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (931eafc) 88.19% compared to head (31db7bc) 87.67%.
Report is 6 commits behind head on main.

Files Patch % Lines
src/nanoarrow/utils.c 93.75% 6 Missing ⚠️
src/nanoarrow/nanoarrow_types.h 64.28% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #361      +/-   ##
==========================================
- Coverage   88.19%   87.67%   -0.52%     
==========================================
  Files          74       75       +1     
  Lines       12212    12940     +728     
==========================================
+ Hits        10770    11345     +575     
- Misses       1442     1595     +153     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@WillAyd
Copy link
Contributor

WillAyd commented Jan 15, 2024

Very cool - this would be very helpful for arrow-adbc. Is this expected to work for Decimal256 as well?

@paleolimbot
Copy link
Member Author

Is this expected to work for Decimal256 as well?

Yes!

It would probably be more useful for ADBC if it handled the decimal point and a trailing exponent, but that should be fairly straightforward to add after the integer version.

@WillAyd
Copy link
Contributor

WillAyd commented Jan 16, 2024

It would probably be more useful for ADBC if it handled the decimal point and a trailing exponent

The postgres implementation in ADBC currently sends the string and decimal placement separately, so this would be helpful for that without going all the way

@paleolimbot paleolimbot marked this pull request as ready for review January 16, 2024 20:54
100000ULL, 1000000ULL, 10000000ULL, 100000000ULL, 1000000000ULL};

// Adapted from Arrow C++ to use 32-bit words for better C portability
// https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/decimal.cc#L524-L544
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a git hash in place of main if you'd like. (Press 'y' after opening the above link)

It also might be nice to copy the comment over as well if applicable.

Comment on lines 287 to 294
/// \brief Sets the integer value of an ArrowDecimal from a string
ArrowErrorCode ArrowDecimalSetIntString(struct ArrowDecimal* decimal,
struct ArrowStringView value);

/// \brief Get the integer value of an ArrowDecimal as string
ArrowErrorCode ArrowDecimalAppendIntStringToBuffer(const struct ArrowDecimal* decimal,
struct ArrowBuffer* buffer);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the function names are a bit confusing. Maybe something like ArrowDecimalSetFromString? I think the IntString terminology is the part that is hard for me to parse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's not great! I think the reason I didn't pick ArrowDecimalSetFromString() is because to me that would parse the decimal point and this function won't do that. It will also error very quickly if anybody tries to put a decimal point there and I'm game to change it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It floated in last night that Digits is probably a better term. I pushed that change but feel free to suggest something else!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, I love this change!

Comment on lines +277 to +279
if (c < '0' || c > '9') {
return EINVAL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be an unhelpful guess at micro-optimization.. but if you are trying to avoid branching, you can set a bitmask here and check for failure outside of the loop. e.g. is_invalid &= c < '0' || c > '9';. Maybe the compiler would optimize this for us? I wouldn't know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow...I wasn't intending to optimize anything, just to make sure we had a valid integer string before proceeding. The other option is to pass end_ptr instead of NULL into strtoll() and check that it parsed the entire chunk of characters. Maybe that would be less confusing?

Copy link
Member

@danepitkin danepitkin Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore me then! This code is already very readable IMO. Other parts of the diff look like they are specifically written to avoid branching in loops (e.g. if statements), so I wasn't sure if that was an active decision.


// Use 32-bit words for portability
uint32_t words32[8];
int n_words32 = decimal->n_words * 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that this is <= 8?

memset(segments, 0, sizeof(segments));
uint64_t* most_significant_elem = words_little_endian + most_significant_elem_idx;

do {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should comment on what this is doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


// We know our output has no more than 9 digits per segment, plus a negative sign,
// plus any further digits between our output of 9 digits and the maximum theoretical
// number of digits in an unsigned long, including the null terminator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the maximum theoretical number of digits has to do with this, can you explain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to make sure that the 21 in snprintf(, 21, "%lu") is always accurate (I added this to the comment).

int n_chars = snprintf((char*)buffer->data + buffer->size_bytes, 21, "%09lu",
(unsigned long)segments[i]);
buffer->size_bytes += n_chars;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assert that buffer->size_bytes is not longer than the size allocated above?


for (auto bitwidth : {128, 256}) {
ArrowDecimalInit(&decimal, bitwidth, 10, 3);
ArrowDecimalSetInt(&decimal, 12345);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test with larger numbers?

EXPECT_EQ(std::string(reinterpret_cast<char*>(buffer.data), buffer.size_bytes),
"18446744073709551615");

// Check with the maximum value of a 128-bit integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also have a similar test for 256 bits?

@danepitkin
Copy link
Member

Overall LGTM!

@paleolimbot paleolimbot merged commit c4844e3 into apache:main Jan 25, 2024
31 checks passed
@paleolimbot paleolimbot deleted the decimal-support branch January 25, 2024 15:21
@paleolimbot paleolimbot added this to the nanoarrow 0.4.0 milestone Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants