feat: Add decimal support to integration tester #361

paleolimbot · 2024-01-12T19:22:04Z

This PR adds decimal support to the integration test utility. Because decimal buffers are implemented in the integration test JSON format as strings containing the integer representation of the decimal, it meant that nanoarrow needed an implementation of arbitrarily large integer to/from string. I modified this from Arrow C++ (links in comments next to the implementation) with a few differences to avoid porting the complete int128 implementation and the C++ standard library.

Parse strings containing arbitrarily large integers into decimal words
Convert decimal words into arbitrarily large integer strings
Wire the converters into the integration tester

The gaps in test coverage are from big-endian parts, which I are tested as part of weekly verification and that I tested locally with:

export NANOARROW_ARCH=s390x
docker compose run --rm verify

With archery integration --with-cpp=true --with-nanoarrow=true --run-c-data, the decimal tests now pass:

##########################################################
C Data Interface: C++ exporting, C++ importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because producer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: C++ exporting, nanoarrow importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 454, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 454, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 501, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 501, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because producer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: nanoarrow exporting, C++ importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because consumer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because consumer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because consumer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: nanoarrow exporting, nanoarrow importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
... with record batch #0
... with record batch #1
======================================================================


################# FAILURES #################
FAILED TEST: run_end_encoded C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

12 failures, 9 skips

codecov-commenter · 2024-01-12T19:23:55Z

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (931eafc) 88.19% compared to head (31db7bc) 87.67%.
Report is 6 commits behind head on main.

Files	Patch %	Lines
src/nanoarrow/utils.c	93.75%	6 Missing ⚠️
src/nanoarrow/nanoarrow_types.h	64.28%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #361      +/-   ##
==========================================
- Coverage   88.19%   87.67%   -0.52%     
==========================================
  Files          74       75       +1     
  Lines       12212    12940     +728     
==========================================
+ Hits        10770    11345     +575     
- Misses       1442     1595     +153

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

WillAyd · 2024-01-15T02:57:30Z

Very cool - this would be very helpful for arrow-adbc. Is this expected to work for Decimal256 as well?

paleolimbot · 2024-01-15T11:55:53Z

Is this expected to work for Decimal256 as well?

Yes!

It would probably be more useful for ADBC if it handled the decimal point and a trailing exponent, but that should be fairly straightforward to add after the integer version.

WillAyd · 2024-01-16T02:18:36Z

It would probably be more useful for ADBC if it handled the decimal point and a trailing exponent

The postgres implementation in ADBC currently sends the string and decimal placement separately, so this would be helpful for that without going all the way

danepitkin · 2024-01-16T21:35:32Z

src/nanoarrow/utils.c

+    100000ULL, 1000000ULL, 10000000ULL, 100000000ULL, 1000000000ULL};
+
+// Adapted from Arrow C++ to use 32-bit words for better C portability
+// https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/decimal.cc#L524-L544


You can add a git hash in place of main if you'd like. (Press 'y' after opening the above link)

It also might be nice to copy the comment over as well if applicable.

danepitkin · 2024-01-16T21:44:47Z

src/nanoarrow/nanoarrow.h

+/// \brief Sets the integer value of an ArrowDecimal from a string
+ArrowErrorCode ArrowDecimalSetIntString(struct ArrowDecimal* decimal,
+                                        struct ArrowStringView value);
+
+/// \brief Get the integer value of an ArrowDecimal as string
+ArrowErrorCode ArrowDecimalAppendIntStringToBuffer(const struct ArrowDecimal* decimal,
+                                                   struct ArrowBuffer* buffer);
+


IMO the function names are a bit confusing. Maybe something like ArrowDecimalSetFromString? I think the IntString terminology is the part that is hard for me to parse.

I agree that it's not great! I think the reason I didn't pick ArrowDecimalSetFromString() is because to me that would parse the decimal point and this function won't do that. It will also error very quickly if anybody tries to put a decimal point there and I'm game to change it.

It floated in last night that Digits is probably a better term. I pushed that change but feel free to suggest something else!

Even better, I love this change!

danepitkin · 2024-01-16T22:02:01Z

src/nanoarrow/utils.c

+    if (c < '0' || c > '9') {
+      return EINVAL;
+    }


this might be an unhelpful guess at micro-optimization.. but if you are trying to avoid branching, you can set a bitmask here and check for failure outside of the loop. e.g. is_invalid &= c < '0' || c > '9';. Maybe the compiler would optimize this for us? I wouldn't know.

I'm not sure I follow...I wasn't intending to optimize anything, just to make sure we had a valid integer string before proceeding. The other option is to pass end_ptr instead of NULL into strtoll() and check that it parsed the entire chunk of characters. Maybe that would be less confusing?

ignore me then! This code is already very readable IMO. Other parts of the diff look like they are specifically written to avoid branching in loops (e.g. if statements), so I wasn't sure if that was an active decision.

pitrou · 2024-01-22T15:01:01Z

src/nanoarrow/utils.c

+
+  // Use 32-bit words for portability
+  uint32_t words32[8];
+  int n_words32 = decimal->n_words * 2;


Assert that this is <= 8?

pitrou · 2024-01-22T15:05:24Z

src/nanoarrow/utils.c

+  memset(segments, 0, sizeof(segments));
+  uint64_t* most_significant_elem = words_little_endian + most_significant_elem_idx;
+
+  do {


You should comment on what this is doing.

pitrou · 2024-01-22T15:07:33Z

src/nanoarrow/utils.c

+
+  // We know our output has no more than 9 digits per segment, plus a negative sign,
+  // plus any further digits between our output of 9 digits and the maximum theoretical
+  // number of digits in an unsigned long, including the null terminator.


I don't understand what the maximum theoretical number of digits has to do with this, can you explain?

The idea was to make sure that the 21 in snprintf(, 21, "%lu") is always accurate (I added this to the comment).

pitrou · 2024-01-22T15:07:53Z

src/nanoarrow/utils.c

+    int n_chars = snprintf((char*)buffer->data + buffer->size_bytes, 21, "%09lu",
+                           (unsigned long)segments[i]);
+    buffer->size_bytes += n_chars;
+  }


Can you assert that buffer->size_bytes is not longer than the size allocated above?

pitrou · 2024-01-22T15:08:47Z

src/nanoarrow/utils_test.cc

+
+  for (auto bitwidth : {128, 256}) {
+    ArrowDecimalInit(&decimal, bitwidth, 10, 3);
+    ArrowDecimalSetInt(&decimal, 12345);


Can you test with larger numbers?

pitrou · 2024-01-22T15:10:55Z

src/nanoarrow/utils_test.cc

+  EXPECT_EQ(std::string(reinterpret_cast<char*>(buffer.data), buffer.size_bytes),
+            "18446744073709551615");
+
+  // Check with the maximum value of a 128-bit integer


Could you also have a similar test for 256 bits?

danepitkin · 2024-01-24T19:44:38Z

Overall LGTM!

first stab

8693900

paleolimbot force-pushed the decimal-support branch from 77f9263 to 8693900 Compare January 12, 2024 19:23

paleolimbot added 2 commits January 12, 2024 17:12

get parsing to work

5a9e97a

try static

51ded36

paleolimbot added 7 commits January 15, 2024 17:15

stub formatter

72fa094

add link to source

d46eb8b

fix algorithm

bd5b4be

test on powers of 10 and powers of 2

499437c

fix namespace

c4276ff

negate working

7baac03

better negate

9655462

paleolimbot added 5 commits January 16, 2024 12:41

add to integration tester

b0605a7

test decimal roundtrip

74f7bb1

fix leak

3f4f5ce

test negate

1b1d692

formatter

d05fe64

paleolimbot marked this pull request as ready for review January 16, 2024 20:54

danepitkin reviewed Jan 16, 2024

View reviewed changes

paleolimbot and others added 2 commits January 16, 2024 19:49

git hashes in comments

e1df263

intstring -> digits

1dee277

pitrou requested changes Jan 22, 2024

View reviewed changes

paleolimbot added 2 commits January 22, 2024 16:47

address comments in utils

2a198d3

better tests

31db7bc

paleolimbot merged commit c4844e3 into apache:main Jan 25, 2024
31 checks passed

paleolimbot deleted the decimal-support branch January 25, 2024 15:21

paleolimbot added this to the nanoarrow 0.4.0 milestone Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add decimal support to integration tester #361

feat: Add decimal support to integration tester #361

paleolimbot commented Jan 12, 2024 •

edited

Loading

codecov-commenter commented Jan 12, 2024 •

edited

Loading

WillAyd commented Jan 15, 2024

paleolimbot commented Jan 15, 2024

WillAyd commented Jan 16, 2024

danepitkin Jan 16, 2024

danepitkin Jan 16, 2024

paleolimbot Jan 17, 2024

paleolimbot Jan 17, 2024

danepitkin Jan 17, 2024

danepitkin Jan 16, 2024

paleolimbot Jan 16, 2024

danepitkin Jan 17, 2024 •

edited

Loading

pitrou Jan 22, 2024

pitrou Jan 22, 2024

paleolimbot Jan 22, 2024

pitrou Jan 22, 2024

paleolimbot Jan 22, 2024

pitrou Jan 22, 2024

pitrou Jan 22, 2024

pitrou Jan 22, 2024

danepitkin commented Jan 24, 2024

feat: Add decimal support to integration tester #361

feat: Add decimal support to integration tester #361

Conversation

paleolimbot commented Jan 12, 2024 • edited Loading

codecov-commenter commented Jan 12, 2024 • edited Loading

Codecov Report

WillAyd commented Jan 15, 2024

paleolimbot commented Jan 15, 2024

WillAyd commented Jan 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin commented Jan 24, 2024

paleolimbot commented Jan 12, 2024 •

edited

Loading

codecov-commenter commented Jan 12, 2024 •

edited

Loading

danepitkin Jan 17, 2024 •

edited

Loading