Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scikit-build-core build system rework #1971

Closed
wants to merge 61 commits into from
Closed

Conversation

dudoslav
Copy link
Collaborator

@dudoslav dudoslav commented May 17, 2024

Rework build system into CMake + scikit-build-core combination. CMake is used to build all native bindings (pybind11/Cython) and to download TileDB core library if needed. Scikit-build-core is responsible for packaging this logic into python wheels and sdist.

Some CI/CD changes were required. We no longer need to download TileDB core library in CI/CD since CMake can do if for us. So all CI workflows are now reworked into two main ones:

  • ci.yml - build and test tiledb-py by installing it: pip install . and then running tests against this install
  • build_wheels.yml - build wheels for each platform/python_version and then install wheel and test it.

Both of these pipeline are very similar so it might make more sense to just use one, preferably build_wheels.yml.

As with all our other projects, this project also respects TILEDB_PATH and uses it as if it was TileDB_DIR for interoperability with CMake's find_package.

Note: Running tests locally is an issue right now, because tests need python bindings already compiled and installed in tiledb folder, which is a source folder.

@dudoslav dudoslav requested review from ihnorton and kounelisagis May 17, 2024 13:04
@dudoslav dudoslav self-assigned this May 17, 2024
@dudoslav
Copy link
Collaborator Author

This logic produces this folder structure (please ignore vector-search there):

(base) taco:tiledb dudoslav$ ls -l
.rw-r--r-- 3.9k dudoslav 17 May 14:39 __init__.py
drwxr-xr-x    - dudoslav 17 May 14:39 __pycache__
.rw-r--r--  469 dudoslav 17 May 14:39 _generated_version.py
.rw-r--r--  16k dudoslav 17 May 14:39 array_schema.py
.rw-r--r--  11k dudoslav 17 May 14:39 attribute.py
.rwxr-xr-x 1.1M dudoslav 17 May 14:39 cc.cpython-310-x86_64-linux-gnu.so
drwxr-xr-x    - dudoslav 27 Mar 18:37 cloud
.rw-r--r-- 1.2k dudoslav 17 May 14:39 CMakeLists.txt
.rw-r--r-- 1.8k dudoslav 17 May 14:39 common.pxi
.rw-r--r-- 3.8k dudoslav 17 May 14:39 consolidation_plan.py
.rw-r--r--  58k dudoslav 17 May 14:39 core.cc
.rw-r--r--  19k dudoslav 17 May 14:39 ctx.py
.rw-r--r--  207 dudoslav 17 May 14:39 data_order.py
.rw-r--r--  36k dudoslav 17 May 14:39 dataframe_.py
.rw-r--r-- 8.0k dudoslav 17 May 14:39 datatypes.py
.rw-r--r-- 2.1k dudoslav 17 May 14:39 debug.cc
.rw-r--r-- 9.5k dudoslav 17 May 14:39 dimension.py
.rw-r--r-- 2.8k dudoslav 17 May 14:39 dimension_label.py
.rw-r--r-- 4.0k dudoslav 17 May 14:39 dimension_label_schema.py
.rw-r--r-- 5.6k dudoslav 17 May 14:39 domain.py
.rw-r--r-- 4.9k dudoslav 17 May 14:39 enumeration.py
.rw-r--r-- 5.5k dudoslav 17 May 14:39 filestore.py
.rw-r--r--  34k dudoslav 17 May 14:39 filter.py
.rw-r--r-- 9.0k dudoslav 17 May 14:39 fragment.cc
.rw-r--r--  22k dudoslav 17 May 14:39 fragment.py
.rw-r--r--  16k dudoslav 17 May 14:39 group.py
.rw-r--r--  13k dudoslav 17 May 14:39 highlevel.py
.rw-r--r--  134 dudoslav 17 May 14:39 indexing.pxd
.rw-r--r--  14k dudoslav 17 May 14:39 indexing.pyx
drwxr-xr-x    - dudoslav 17 May 14:39 lib
.rw-r--r--  18k dudoslav 17 May 14:39 libmetadata.pyx
.rwxr-xr-x 1.6M dudoslav 17 May 14:39 libtiledb.cpython-310-x86_64-linux-gnu.so
.rw-r--r--  32k dudoslav 17 May 14:39 libtiledb.pxd
.rw-r--r-- 155k dudoslav 17 May 14:39 libtiledb.pyx
.rw-r--r--  773 dudoslav 17 May 14:39 main.cc
.rwxr-xr-x 677k dudoslav 17 May 14:39 main.cpython-310-x86_64-linux-gnu.so
.rw-r--r--  27k dudoslav 17 May 14:39 multirange_indexing.py
.rw-r--r--  17k dudoslav 17 May 14:39 npbuffer.cc
.rw-r--r--  330 dudoslav 17 May 14:39 npbuffer.h
.rw-r--r--   99 dudoslav 17 May 14:39 numpyFlags.h
.rw-r--r-- 1.0k dudoslav 17 May 14:39 object.py
.rw-r--r--  148 dudoslav 17 May 14:39 parquet_.py
.rw-r--r--  27k dudoslav 17 May 14:39 py_arrow_io_impl.h
.rw-r--r-- 3.2k dudoslav 17 May 14:39 py_arrowio
.rw-r--r--  789 dudoslav 17 May 14:39 query.py
.rw-r--r-- 9.2k dudoslav 17 May 14:39 query_condition.cc
.rw-r--r--  18k dudoslav 17 May 14:39 query_condition.py
.rw-r--r-- 4.0k dudoslav 17 May 14:39 schema_evolution.cc
.rw-r--r-- 2.4k dudoslav 17 May 14:39 schema_evolution.py
.rw-r--r-- 2.2k dudoslav 17 May 14:39 serialization.cc
.rw-r--r-- 4.0k dudoslav 17 May 14:39 subarray.py
drwxr-xr-x    - dudoslav 17 May 14:39 tests
.rw-r--r--  840 dudoslav 17 May 14:39 util.cc
.rw-r--r-- 1.1k dudoslav 17 May 14:39 util.h
drwxr-xr-x    - dudoslav 14 May 09:03 vector_search
.rw-r--r--  386 dudoslav 17 May 14:39 version_helper.py
.rw-r--r--  17k dudoslav 17 May 14:39 vfs.py

@dudoslav
Copy link
Collaborator Author

I added basic logic for downloading TileDB from release inside CMake so now we need to change CI/CD to not download release archives.

@dudoslav
Copy link
Collaborator Author

Wheel builds pass:
https://github.com/TileDB-Inc/TileDB-Py/actions/runs/9157079084/job/25172731354?pr=1971

@dudoslav
Copy link
Collaborator Author

I had to disable build for python 3.9, I assume that that is and issue

@dudoslav
Copy link
Collaborator Author

I also added sdist tests so that we can be sure that sdist can be build on specified os/python combinations: https://github.com/dudoslav/TileDB-Py/actions/runs/9303158461

- name: Get sdist package name
id: get_sdist_name
run: |
echo "sdist_name=$(ls dist/ | head -n 1)" >> "$GITHUB_OUTPUT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relying on the file ordering, seems like it could be brittle? Can we construct the sdist name from the version? Or at least it should be ls *.tar.gz to be more specific.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be only one file in /dist since this is completely clean job. But adding *.tar.gz is a good idea.

path: dist

- name: Install sdist artifact
run: pip install --verbose dist/${{ needs.build_sdist.outputs.sdist_name }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking but we might want to change this later to run in parallel with the wheel builds, because this will have to build everything from source so it will take some time at the end that could be amortized.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand correctly, but this does run in parallel right now. All jobs (around 40) run in parallel, half is building wheels and half is building sdist and testing it.

@dudoslav dudoslav closed this Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants