Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++] SOMAColumn serialization/deserialization #3599

Conversation

XanthosXanthopoulos
Copy link
Collaborator

Issue and/or context:
Generate a JSON description of the SOMAColumn schema to enable reconstruction of it when opening an array. If the required metadata are missing, SOMADimensions and SOMAAttributes are generated for each TileDB Dimension and Attribute as a default schema. If the array is opened in write mode, the generated metadata are stored.

Changes:

Notes for Reviewer:
Missing metadata update in case of schema evolution (to be included in a follow-up PR)

@XanthosXanthopoulos XanthosXanthopoulos changed the base branch from main to xan/sc-59427/soma-column-arrow-integration January 21, 2025 19:50
@XanthosXanthopoulos XanthosXanthopoulos force-pushed the xan/sc-59427/soma-column-arrow-integration branch from d7077d2 to 31b8bbe Compare January 22, 2025 10:35
@XanthosXanthopoulos XanthosXanthopoulos force-pushed the xan/sc-59427/serialize-soma-column branch from 880dcd3 to 1251985 Compare January 22, 2025 11:28
@johnkerl johnkerl merged commit d7f1bf1 into xan/sc-59427/soma-column-arrow-integration Jan 22, 2025
6 checks passed
@johnkerl johnkerl deleted the xan/sc-59427/serialize-soma-column branch January 22, 2025 19:41
XanthosXanthopoulos added a commit that referenced this pull request Jan 23, 2025
* Add minimal testing for dimensions

* Add minimal testing for dimensions

* Add read test case

* Remove current_domain flag

* Do not export soma column [skip ci]

* Replace string_view with string when returning column name, add current domain checks, replace vector with span when selecting points

* Add serialization/deserelization methods

* Serialize SOMAColumn on schema generation

* Update unit tests

* Generate columns on array open

* Add deserialization and default initialization on array open

* Write SOMAColumn metadata if array is open in `write` mode

* Write metadata directly to TileDB array

* Fix error in tests after rebase

* Handle addition and deletion of attributes

* Fix R tests
XanthosXanthopoulos added a commit that referenced this pull request Jan 23, 2025
* Remove fmt::format

* Remove unneeded methods and member variables

* Add minimal testing for dimensions

* Replace string_view with string when returning column name, add current domain checks, replace vector with span when selecting points

* Remove current_domain flag

* Replace string_view with string when returning column name, add current domain checks, replace vector with span when selecting points

* Update CMake files

* Add minimal testing for dimensions

* Misc fixes

* Add read test case

* Remove current_domain flag

* Do not export soma column [skip ci]

* Replace string_view with string when returning column name, add current domain checks, replace vector with span when selecting points

* Add function to extract data from ArrowTable into std::array

* Migrate array creation to SOMAColumn

* Fix string current domain in unit dataframe tests

* Fix current domain unit test on string dimension

* Remove unused methods

* Misc fixes

* Address review comments

* Replace Skip template parameter with a function parameter

* Address review comment about unit test

* [c++] SOMAColumn serialization/deserialization (#3599)

* Add minimal testing for dimensions

* Add minimal testing for dimensions

* Add read test case

* Remove current_domain flag

* Do not export soma column [skip ci]

* Replace string_view with string when returning column name, add current domain checks, replace vector with span when selecting points

* Add serialization/deserelization methods

* Serialize SOMAColumn on schema generation

* Update unit tests

* Generate columns on array open

* Add deserialization and default initialization on array open

* Write SOMAColumn metadata if array is open in `write` mode

* Write metadata directly to TileDB array

* Fix error in tests after rebase

* Handle addition and deletion of attributes

* Fix R tests

* [c++] Make `SOMAColumn` metadata required only for `GeometryDataframe` (#3621)

* Make SOMAColumn metadata only required by GeometryDataframe

* Update tests

* Apply suggestions from code review

Co-authored-by: John Kerl <[email protected]>

* Rename constants

---------

Co-authored-by: John Kerl <[email protected]>
XanthosXanthopoulos added a commit that referenced this pull request Jan 23, 2025
* Add minimal testing for dimensions

* Add minimal testing for dimensions

* Add read test case

* Remove current_domain flag

* Do not export soma column [skip ci]

* Replace string_view with string when returning column name, add current domain checks, replace vector with span when selecting points

* Add serialization/deserelization methods

* Serialize SOMAColumn on schema generation

* Update unit tests

* Generate columns on array open

* Add deserialization and default initialization on array open

* Write SOMAColumn metadata if array is open in `write` mode

* Write metadata directly to TileDB array

* Fix error in tests after rebase

* Handle addition and deletion of attributes

* Fix R tests
XanthosXanthopoulos added a commit that referenced this pull request Jan 28, 2025
…ay`, part 2 (#3407)

* SOMAColumn abstract class definition

* Remove fmt::format

* Remove unneeded methods and member variables

* Add concrete class wrapper for TileDB dimension

* Add minimal testing for dimensions

* Replace string_view with string when returning column name, add current domain checks, replace vector with span when selecting points

* Add concrete class wrapper for TileDB attribute

* Update CMake files

* Add minimal testing for dimensions

* Misc fixes

* Add read test case

* Remove current_domain flag

* Do not export soma column [skip ci]

* Migrate array creation to SOMAColumn

* Misc fixes

* [c++] SOMAColumn serialization/deserialization (#3599)

* Add minimal testing for dimensions

* Add minimal testing for dimensions

* Add read test case

* Remove current_domain flag

* Do not export soma column [skip ci]

* Replace string_view with string when returning column name, add current domain checks, replace vector with span when selecting points

* Add serialization/deserelization methods

* Serialize SOMAColumn on schema generation

* Update unit tests

* Generate columns on array open

* Add deserialization and default initialization on array open

* Write SOMAColumn metadata if array is open in `write` mode

* Write metadata directly to TileDB array

* Fix error in tests after rebase

* Handle addition and deletion of attributes

* Fix R tests

* [c++] Make `SOMAColumn` metadata required only for `GeometryDataframe` (#3621)

* Make SOMAColumn metadata only required by GeometryDataframe

* Update tests

* Fill SOMAColumn info on array open

* MIgrate domain access methods to use SOMAColumns

* Add optional non empty domain method

* Replace optional non empty domain with the SOMAColumn implementation, update python bindings

* Add template-specialization guards

* Remove unsupported dimension datatypes

* Update old version of `fill_metadata_cache`

* Filter SOMAColumns when iterating to construct the domain

* Fix serialized columns order

* log type [skip ci]

* Specify LIBCPP_TYPEINFO_COMPARISON_IMPLEMENTATION for clang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants