Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: migrate to scikit-build-core #3564

Closed
3 of 7 tasks
agriyakhetarpal opened this issue Nov 27, 2023 · 14 comments · Fixed by #4487
Closed
3 of 7 tasks

Tracking issue: migrate to scikit-build-core #3564

agriyakhetarpal opened this issue Nov 27, 2023 · 14 comments · Fixed by #4487
Assignees
Labels
difficulty: hard Will take several weeks priority: medium To be resolved if time allows

Comments

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Nov 27, 2023

Now that we have a declarative format for build metadata provided by pyproject.toml and build isolation enabling a more stable installation procedure via PEP 517/518, it eases the migration to either the scikit-build-core or the meson-python build-backend(s). Both of them include support for editable installations and are slowly becoming more mature for downstream users, considering that NumPy1 and SciPy2 have successfully transitioned this year.

The scikit-build-core build backend would still use CMake as a build system as we do currently, but meson-python requires ninja as a build system, which would render the IDAKLU target compilation procedure radically different from the current one.

Some advantages of these new build-backends:

  1. Better support for compilers plus ease of use3
  2. Reliable build caching for faster installation
  3. Both of them are faster at compilation and more modern (multithreaded builds, cross-compilation, do not use distutils or setuptools at all)
  4. Packagers for libraries in the Scientific Python ecosystem are slowly migrating and we should too
  5. Might enable faster IDAKLU compilation with other compilers like MinGW? i.e., on Windows when building from source (which we do only when building the wheel)

Task list

These tasks will be useful to look at, but not all of them might end up being needed or valid – subject to further discussion:

  • Establish local SUNDIALS and SuiteSparse installations in a folder sundials_installed/ or similar by adjusting CMAKE_INSTALL_PREFIX instead of modifying the .local/ folder in the HOME directory` for a user
  • Specify pybind11 as a build-time dependency in pyproject.toml
  • Maybe build just IDAS instead of all SUNDIALS components, considering it is what we link to for the IDAKLU target?
  • On macOS, use a better way to find and use libomp.dylib that's set with the correct MACOSX_DEPLOYMENT_TARGET at least when deploying the wheels (not the Homebrew one). The reason for this is that Apple Clang does not come with the OpenMP specification – however, other compilers such as LLVM and GCC do. See also: Fortnightly build for wheels failed / PyBaMM import kills process on macOS M-series #4091, Use wider-compatible LLVM-OpenMP, bundle libgfortran dylibs for macOS wheels #4092.
  • On platforms we don't support, the sdist is used to install PyBaMM, and the IDAKLU solver which is optional cannot be installed unless one is installing from source or using pip install pybamm --no-binary pybamm to force installing from the sdist on PyPI. To provide a means to attempt IDAKLU compilation, a --config-settings command-line flag can be added to the pip install pybamm command that can be parsed by the build backend (scikit-build-core in this case). Therefore, users should have to manually opt-in to the compilation (the build option should be off by default). For this, pybind11 should be a build-time dependency (see above).
  • Look at ways to improve the user experience when installing SuiteSparse and SUNDIALS (which is by nox -s pybamm-requires on macOS and Linux, possibly remove this command) and add docs on how to do this on Windows
  • Add CMakeLists.txt (included CMakeLists.txt #4101) and other relevant files such as FindSuiteSparse.cmake and FindSUNDIALS.cmake to the SDist via the MANIFEST.in file, otherwise, CMake will raise an error about a missing project status. This needs to be done as a part of the migration and a job could be added that tests building a wheel from the sdist nightly (and/or we can use the scheduled tests).

Footnotes

  1. https://rgoswami.me/posts/numpy-meson-f2py/

  2. https://labs.quansight.org/blog/building-scipy-with-flang

  3. https://labs.quansight.org/blog/2021/07/moving-scipy-to-meson

@agriyakhetarpal agriyakhetarpal added difficulty: hard Will take several weeks priority: low No existing plans to resolve labels Nov 27, 2023
@kratman
Copy link
Contributor

kratman commented Nov 27, 2023

Personally I would prefer to work with CMake over Ninja, but that could just be my bias from spending most of my career as a C++ dev

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Nov 27, 2023

Yes, CMake is the easier of the two overall and requires less effort since the compilation will still be controlled via the CMakeLists.txt file. I don't know much about Meson enough to comment on it.

Nevertheless, this can be a potential GSoC project for someone who wants to tackle Python packaging since the changes would revolve a lot around that area, if not entirely around it.

@cringeyburger
Copy link
Contributor

Hello! I was going through the GSOC 2024 Idealist and encountered this project. I am interested in learning and working on the Python backend, and this project will significantly help develop my skills. In the same route, I can help contribute to PyBaMM!

As a starter, I would like to discuss whether we should decide on meson-python or scikit-build-core.

I would prefer meson-python over scikit-build-core, mainly because of its popularity after SciPy (version 1.9.0) and NumPy (version 1.26) decided to migrate to it. The documentation, posts, blogs and projects that sparked because of this would be helpful.

Also, meson-python is the most popular method after setuptools to build C extensions.

I would also like to discuss some technical aspects of the project. Thank you!

@agriyakhetarpal
Copy link
Member Author

Thanks for sharing, @cringeyburger. I would agree that meson-python is being used highly, but NumPy and SciPy use a vendored version of it because they want features that do not yet exist in the main source code. The reason, I believe, is that the Meson developers tend to prevent pushing anything hacky into the codebase and wish to do things in the conventional way and how robust solutions go. Both projects are very large and use almost every feature available in Meson, which we may not require for our use case – that should be great.

The Ninja build system should be able to handle the IDAKLU solver's compilation, but I am unsure whether it can do so for the prerequisites (SuiteSparse and SUNDIALS) – they might require a pure CMake build or perhaps calling CMake from inside meson.build. I am aware that pybind11 works with meson-python and can be used as normal, so it should just be a matter of pointing the build procedure to the location of the compiled files for the prerequisites. I would perhaps explore methods for caching so that editable installations can be sped up – Meson's in-place builds on the import pybamm statement can then rely on just compiling specific sections of the code – my experiments with pybop-team/PyBOP#176 revealed that around 30 seconds are consumed by the IDAKLU compilation, and ~1 min 30 seconds for SuiteSparse and SUNDIALS on Unix-based systems.

However, Windows builds are significantly long and take ~20 minutes overall, without caching which would be a problem on editable installations. It looks like microsoft/vcpkg#36885 will get merged soon which could provide some benefits (I'm unsure whether it is the SUNDIALS compilation or the SuiteSparse's component's compilation that takes longer on Windows). This might require bumping SUNDIALS to v6.7.0, we use v6.5.0 – it doesn't look like there are many changes to the IDAS component that we use (here are the release notes) and subsequently perhaps SuiteSparse from v6.0.3 to a later version (which would require changes to the C++ solver code).

We should be able to forego setup.py entirely with Meson – it has been done for NumPy: numpy/numpy#24519 and should be out with NumPy v2 (any time now). With scikit-build-core, we might not be able to do so completely, but it isn't an issue, we can remove the dependence on wheel.bdist_wheel (it has been recommended against using it for quite some time now) and rewrite the setup.py code to be cleaner. Like it has been mentioned above, using this choice of backend will be much easier to integrate – everything will be governed by the CMake flags and CMakeLists.txt, which we already have.

@Saransh-cpp
Copy link
Member

+1 for scikit-build-core

@agriyakhetarpal agriyakhetarpal added priority: medium To be resolved if time allows and removed priority: low No existing plans to resolve labels Feb 29, 2024
@agriyakhetarpal
Copy link
Member Author

Bumping the priority to medium since this will now be worked on this year

@agriyakhetarpal
Copy link
Member Author

I did a quick test and I notice that on macOS, vcpkg install sundials takes 14 seconds, while vcpkg install suitesparse took 5.5 minutes, so perhaps using sundials[klu] when it is ready upstream will also resolve our Windows woes? Building on Windows is slow in general and our Windows builds painfully take 20 or so minutes and this is after enabling multithreaded builds (without which it takes even longer). But I see that SuiteSparse uses MinGW to compile in Windows CI where the time is around 2 minutes; we compile from source using GCC on Unix-based platforms where this hardly takes 30 seconds – I assume adding Ccache support will be highly beneficial for Windows

@cringeyburger
Copy link
Contributor

cringeyburger commented Feb 29, 2024

On windows x64 architecture laptop, vcpkg install sundials took 19 seconds and vcpks install suitesparse took a solid 16 minutes (including additional packages)

  • I forgot that I had ninja VER 1.11.0 installed, so it defaulted to it
  • For Fortran, I had nothing installed before hand so it used MinGW Fortran -> handled by vcpkg, installed gfortran for windows
  • Though the time it took to handle suitesparse:x64-windows is 1.8 minutes.

On WSL running Ubuntu x64-linux, vcpkg install sundials took 26 seconds and vcpkg install suitesparse took 5 minutes (including additional packages)

  • For Fortran, it used GNU 11.4.0
  • The time taken to handle suitesparse:x64-linux is still 1.8 minutes.

Does vcpkg have parallelism and caching features?
I think we should try to explore Ninja, because of the following reasons:

As you have said, if that PR with vcpkg is merged, then having suitesparse[klu] can help speed up the process.

I am interested in learning more about Ccache, but from what I know, caching will definitely help improve the successive build processes.

For large packages like ours, the incremental build feature of CMake utilised by scikit-build-core would work well with Ccache.

Assuming that majority of our team is familiar with CMake, I too think that scikit-build-core is the better option.

NOTE: My network speed (measured using Google's speed test) is ~100mbps

@agriyakhetarpal
Copy link
Member Author

You mention that vcpkg install suitesparse took sixteen minutes with additional components – for Windows I suppose it comes with METIS being compiled. From SuiteSparse, we require just the KLU component for SUNDIALS to use for sparse matrix support, and this component relies on three other components: AMD, COLAMD, and BTF. Is there a way to disable certain components and build just these required SuiteSparse components, by passing extra flags to the install command? Getting down to 1.8 minutes across platforms would be a significant boost!

We should still use the GitHub releases for SuiteSparse on Linux and macOS though – they seem to be faster than vcpkg, i.e., through the nox -s pybamm-requires session. We could remove that and let everything proceed through a pip install <...> command – the user should just have a compiler toolchain present and the build procedure should do and abstract away the rest of the work (I have some experiments with this in a branch locally where I used setuptools to do so, but I never pushed them in favour of renovating the build script through this issue and subsequent GSoC project).

I am not sure about parallelism in vcpkg because that depends on the choice of CMake generator being used, I went through the issue thread you shared but I don't think we can completely switch to Ninja. But there are indeed binary caching features present through ABI checksums (computed with a combination of triplet files, files present in the port being used, compiler configurations, etc.) We do cache them in CI, but our release process is once in four months which means it is never used. I have seen that with a cache hit it reduces the build time on Windows to about 9 minutes.

I do not know much about Ccache either. It would be great to set that up, yes – we would want incremental builds through scikit-build-core in a way that SUNDIALS and SuiteSparse (KLU) are not rebuilt. This could tie in nicely with removing pybamm-requires.

P.S. I see that you have a Windows system configured already – could you test how long it takes to do vcpkg install casadi through PyBaMM's own registry that we have set up?

@cringeyburger
Copy link
Contributor

cringeyburger commented Mar 1, 2024

Sorry for the late reply!

I tried doing a lot of things for component-based installation of vcpkg install suitesparse, but to no avail. An issue was opened regarding this but the work is still in -progress- (it has almost been two years, but unfortunately no significant work has been done.)

Here is the link to the issue.

As for vcpkg install casadi on Windows-x64, it took me 2.8min using PyBaMM's CasADI registry.

Here is the procedure I followed:

  1. Clone the registry to a "test" folder, now I have a casadi-vcpkg-registry folder inside my test folder.
  2. Change the cwd to the test folder.
  3. Now, I ran vcpkg install casadi --overlay-ports=casadi-vcpkg-registry/ports/casadi

After this vcpkg starting working on the task, but when performing post-build validation I encountered some warnings from vcpkg:
1.warning: Include files should not be duplicated into the /debug/include directory.
2. The /lib/cmake folder should be merged with /debug/lib/cmake and moved to /share/casadi/cmake.
3. The following cmake files were found outside /share/casadi. Please place cmake files in /share/casadi.
4. There should be no absolute paths, such as the following, in an installed package

(I will reproduce the names of the files if you want)

And the final statement:
error: Found 5 post-build check problem(s). To submit these ports to curated catalogs, please first correct the portfile:

Despite the warnings, vcpkg successfully installs casadi

Please let me know if there is any issue from my side!

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Mar 1, 2024

2.8 minutes should be reasonable, thanks – this confirms that the SuiteSparse installation is the slowest of all. 2.8 minutes is of course significantly slower than installing CasADi from its Python wheel, which is a matter of seconds – we were using the wheel earlier but had to resort to using the registry afterwards due to upstream changes. We have an issue open (#3603) for restoring this behaviour, which no one has been able to get to yet.

It looks like the SuiteSparse issue will be dormant for a long time – I noticed that I had seen it before because I was already subscribed to it :)

The CasADi installation from vcpkg looks normal to me especially if it succeeded, since I have noted these logs in our Windows wheels before – but it would be good to look how to clear out the warnings (they do not look to be relevant for the CasADi installation and are probably more so for the registry and its portfile itself).

@cringeyburger
Copy link
Contributor

I tried fixing the warnings by changing the portfile. I followed the instructions given through warnings when I built it.

  1. Added vcpkg-cmake-config as a dependency and added file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/include"). This helped in reducing one warning.
  2. I tried to fix warnings 2 and 3 using vcpkg_cmake_config_fixup(), but it's erroring out because it's searching for some directory that doesn't exist. I tried with and without this command, and the directory CONFIG_PATH share/casdi/cmake doesn't exist.
  3. Regarding number 4, a quick search through the net tells me we may have to fix the absolute paths in the header files themselves.

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Mar 2, 2024

Oh, by "good to look how to clear the warnings" I didn't mean doing so just now – I do not think it is the biggest of deals currently and can definitely be picked up at a later time (maybe CasADi can be added as a vcpkg package upstream too where someone interested can take care of this). Nevertheless, thanks for attempting this!

The gist of the discussion that entailed here should be that despite and regardless of the build backend we end up choosing, reducing the Windows installation time would be great. We don't compile the IDAKLU solver on PR workflows because of that – that is an area where reducing CI time is important. But then we don't compile it when running the Windows scheduled tests either (#3664) where we don't need to bother how long the test runs. Being able to speed things up would benefit us in other ways too, say, identifying issues such as #3783 early on.

@prady0t
Copy link
Contributor

prady0t commented Mar 12, 2024

I'm interested in this project too. I'm in the process of making a proposal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: hard Will take several weeks priority: medium To be resolved if time allows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants