Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Improve Cython Build with Custom build_ext #2638

Merged
merged 17 commits into from
Sep 10, 2020

Conversation

mdemoret-nv
Copy link
Contributor

To support #2541, Cython needs to be built with binding=True to turn all cython functions into cyfunction which we can get the source code location from. However, with the current build system, this must be done on a per file basis (adding # binding=True at the start of each .pyx file) or removing the parallel cythonize/build.

This PR beaks out the build customization from #2541 (suggested by @dantegd ) and creates a reusable, custom build_ext class, cython_build_ext. This works around the above limitations by calling cythonize late in the build process after all other options (such as --singlegpu) have been processed. It was inspired by this comment which proposed a very similar customization.

While this does add some complexity to the python build system, it has many benefits such as:

  • System wide Cython build configuration on the command line
    • For example, to enable profiling: python setup.py build_ext --profile=True
  • System wide, default Cython configuration can be stored in setup.cfg
  • The following comments can be removed from the start of every .pyx file:
    # cython: profile=False
    # distutils: language = c++
    # cython: embedsignature = True
    # cython: language_level = 3
  • Custom build options, such as --singlegpu, can be correctly handled by setuputils and will be displayed with python setup.py --help
    • Instead of parsing the arguments manually and removing them with sys.argv.remove('--singlegpu')
  • Additional cython options can be added to support any cython option such as enable_gdb or cython optimizations
  • The class, cython_build_ext can be used by other libraries in RAPIDS
  • With a few small changes, we could support cython-less installs.

@mdemoret-nv mdemoret-nv requested a review from a team as a code owner August 3, 2020 23:39
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@mdemoret-nv mdemoret-nv added 2 - In Progress Currenty a work in progress Build or Dep Issues related to building the code or dependencies Cython / Python Cython or Python issue labels Aug 3, 2020
@mdemoret-nv mdemoret-nv changed the title [WIP] Improve Cython Build with Custom build_ext [REVIEW] Improve Cython Build with Custom build_ext Aug 26, 2020
@mdemoret-nv
Copy link
Contributor Author

@dantegd and @JohnZed Branch 0.16 has been merged and this PR is ready for review

@mdemoret-nv mdemoret-nv added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Aug 26, 2020
@mdemoret-nv mdemoret-nv requested review from a team as code owners August 26, 2020 22:21
@mdemoret-nv mdemoret-nv changed the base branch from branch-0.15 to branch-0.16 August 26, 2020 22:28
@dantegd dantegd requested review from kkraus14, shwina and dantegd August 27, 2020 01:21
@shwina
Copy link
Contributor

shwina commented Aug 27, 2020

(not an expert by any means and just curious) Could I ask the reason for customizing the build/build_ext commands, v/s passing flags such as binding=True directly to Extension as was being done before? I'm wondering which of the new benefits listed are specifically enabled by breaking out into a custom class.

Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing and reviewing the code everything looks great, just comments about comments and a missing copyright header

python/cython_build_ext.py Show resolved Hide resolved
python/cython_build_ext.py Show resolved Hide resolved
python/setup.py Outdated Show resolved Hide resolved
python/setup.py Outdated Show resolved Hide resolved
python/setup.py Outdated Show resolved Hide resolved
@mdemoret-nv
Copy link
Contributor Author

(not an expert by any means and just curious) Could I ask the reason for customizing the build/build_ext commands, v/s passing flags such as binding=True directly to Extension as was being done before? I'm wondering which of the new benefits listed are specifically enabled by breaking out into a custom class.

Good question. The PR I linked in my initial comment does a good job of describing the issue so I recommend reading that for some background. I used that design as the basis for my approach. I'll do my best to summarize and add a little more context related to cuml.

I customized build/build_ext mainly due to limitations in the Cython build tools that come up when building in parallel. Some of these limitations are due to the differences between the "old" and "new" build processes in Cython (However, I'm not an expert myself, so take some of my interpretations on Cython history with a grain of salt), some are due to a lack of features on Cython's part, and everything is complicated by the odd way that Cython and setuptools/distutils interact with each other.

Specifically for cuml, the customization route was needed for the following reasons:

  1. When using the --singlegpu flag, we need to exclude specific files from the cython compilation using the exclude argument to cythonize().
    1. Previously, we used the Extension.exclude property. However, this no longer works with the Cython new_build_ext. See issue [BUG] setup.py --singlegpu target does not exclude mg cython files when using new parallel build #2037.
    2. This resulted in two build paths: 1) Parallel build for multigpu and 2) Single threaded build for singlegpu. Ideally, everything would use the same build path and be parallelized.
  2. The options we were specifying for Extension.cython_directives (such as profile=False shown here) were most likely being ignored.
    1. Our compilation still worked because every *.pyx file specifies the needed options at the top of the file. One example is here.
    2. This requires changing every file in order to set a global value. If we wanted to build with profile=True, you would need to modify every *.pyx file since the value in the file takes precidence.
  3. Finally, calling cythonize() directly in setup.py was not ideal for our build process.
    1. We had already started moving away from this approach and started using cython.new_build_ext but kept it around for --singlegpu. See here
    2. Calling cythonize() manually has a few downsides:
      1. We cannot use any default options from build or build_ext such as nthreads or install_dir.
      2. This gets called every time setup.py is invoked, even for non-build related commands.
      3. Would prevent us from distributing a cython-less package. See the docs here
      4. Requires manual command line parsing.

All of these reasons drove the design to subclassing build/build_ext instead of trying to update the old system. I tried a few approaches and this is the one that worked out best and has the most benefits for future development.

I'll conclude by saying that this isn't the only viable approach. The way setuptools works can be confusing at times which makes it difficult to decide on the best design. I'm sure there are different ways we could accomplish the same goal. I went this direction because it uses customizations that could be reused by the entire Rapids team and fits within the setuptools architecture (no manual command line parsing).

If you have any more questions, please let me know.

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! This looks great.

@mdemoret-nv mdemoret-nv added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Sep 2, 2020
@mdemoret-nv
Copy link
Contributor Author

rerun tests

1 similar comment
@mdemoret-nv
Copy link
Contributor Author

rerun tests

Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, just very small things, one copyright header, and the following files still have the #cython profile/embedsignature/languag_level headers:

/python/cuml/cluster/kmeans_utils.pxd
/python/cuml/common/cuda.pxd
/python/cuml/common/handle.pxd
/python/cuml/common/opg_data_utils_mg.pxd
/python/cuml/dask/linear_model/elastic_net.py
/python/cuml/dask/linear_model/lasso.py
/python/cuml/dask/metrics/utils.py
/python/cuml/decomposition/utils.pxd
/python/cuml/ensemble/randomforest_shared.pxd
/python/cuml/metrics/confusion_matrix.py
/python/cuml/metrics/regression.pxd
/python/cuml/metrics/utils.py
/python/cuml/preprocessing/onehotencoder_mg.py
/python/cuml/tsa/arima.pxd

was wondering if we could remove them in the PR since the PR already removes them everywhere else?

python/cython_build_ext.py Outdated Show resolved Hide resolved
@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Sep 9, 2020
@mdemoret-nv
Copy link
Contributor Author

@dantegd Removed the # cython: directives entirely from .py files and removed embedsignature, profile and language from .pxd files. This leaves just # distutils: language = c++ in .pxd files. I'm unsure if this is used in .pxd files so I am hesitant to remove it. Do you know if this is safe to remove as well?

@dantegd
Copy link
Member

dantegd commented Sep 9, 2020

I think it should be safe to remove those as well, if something goes wrong local or CI tests should reveal that

@mdemoret-nv
Copy link
Contributor Author

Removed comments from *.pxd files as discussed. @dantegd PR is ready to merge after CD/CI pass.

@mdemoret-nv mdemoret-nv added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Waiting on Author Waiting for author to respond to review labels Sep 9, 2020
@dantegd dantegd merged commit d84283e into rapidsai:branch-0.16 Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge Build or Dep Issues related to building the code or dependencies Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants