Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate include grouping using clang-format #1463

Merged

Conversation

harrism
Copy link
Member

@harrism harrism commented Feb 7, 2024

Closes #1477

Description

This uses the IncludeCategories settings in .clang-format to attempt to enforce our documented #include order in librmm. For discussion, see rapidsai/cudf#15063. This PR uses a subset of the header grouping categories used in that PR.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@harrism harrism added feature request New feature or request non-breaking Non-breaking change labels Feb 7, 2024
@harrism harrism self-assigned this Feb 7, 2024
@github-actions github-actions bot added Python Related to RMM Python API cpp Pertains to C++ code labels Feb 7, 2024
Priority: 1
- Regex: '^<rmm/' # RMM includes
Priority: 2
- Regex: '^<(thrust|cub|cuda)/' # CCCL includes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😻

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo bikeshed colour discussions in the cudf PR, this looks great! Thanks for taking it on @harrism

@harrism harrism marked this pull request as ready for review February 20, 2024 07:50
@harrism harrism requested review from a team as code owners February 20, 2024 07:50
@harrism harrism requested review from rongou and bdice February 20, 2024 07:50
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Feb 21, 2024
This uses the `IncludeCategories` settings in .clang-format to attempt to enforce our documented `#include` order in libcudf. See https://docs.rapids.ai/api/libcudf/stable/developer_guide

I realize that there was a [previous attempt at this](#12760) by @bdice that met with some resistance. Reading it, I wouldn't say it was vetoed; rather, reviewers requested something much simpler. I have a few reasons to attempt this again.

1. To make a separate task much easier. We are undertaking a refactoring of RMM that will replace `rmm::mr::device_memory_resource*` with `rmm::device_async_resource-ref` everywhere in RAPIDS (not just cuDF). This requires adding an include to MANY files. Getting the location of the include right everywhere is very difficult without automatic grouping of headers. I started out writing a bash script to do this before realizing clang-format has the necessary feature. And I realized that my script would never properly handle [files like this](https://github.com/rapidsai/raft/blob/branch-24.04/cpp/bench/ann/src/raft/raft_cagra_wrapper.h).
2. To increase velocity. Everywhere in RAPIDS that we have automated code standard/style/formatting/other, the benefits to velocity have outweighed the costs. To paraphrase @bdice, $auto \nearrow \rightarrow \mu \searrow \rightarrow v \nearrow$
3. The previous PR #12760 had nearly 50 categories of headers. There was no way this could be applied universally across RAPIDS repos. My proposal has 10 categories. I tried to reduce it further but realized that it wouldn't be much less configuration to maintain, so I stopped at 10.

Note that one of the ways that having few categories can work while still maintaining clear groups is that this PR updates many files to use quotes ("") instead of angle brackets (<>) for local cuDF headers that do not live in `cudf/cpp/include`. With our "near to far" include ordering policy, these are arguably the nearest files, and using quotes allows us to have our first category simply check for quotes. These files will be grouped and sorted without blank lines, but in practice this does not lose clarity because typically headers from more than two directories are not included from the same file. The downside of this change is I don't yet know how to automatically enforce it. I hope that when developers accidentally use <> for internal includes that don't start with (e.g.) "cudf", they will be grouped one of the lowest priority categories, and perhaps this will induce them to switch to "" to get the headers listed at the top. The rule is simple: if it's in libcudf but not in `cpp/include/cudf`, then use quotes. For **everything** else, use angle brackets.

Other than headers from RAPIDS repos, we have a group for all CCCL/CUDA headers, a group for all other headers that have a file extension, and a final group for all files that have no file extension (e.g. STL).

Below I'm listing the (fairly simple, in my opinion) .clang-format settings for this PR. Note that categories 2-5 will require tweaking for different RAPIDS repos. 

Some may ask why I ordered `cudf_test` headers before `cudf` headers. I tried both orders, and putting `cudf_test` first generated significantly fewer changes in the PR, meaning that it's already the more common ordering (I suppose `cudf_test` is closer to the files that include it, since they are libcudf tests).

I've opened a similar PR for RMM with only 5 groups. rapidsai/rmm#1463

CC @davidwendt @vyasr @wence- @GregoryKimball for feedback

@isVoid contributed to this PR via pair programming.

```
IncludeBlocks: Regroup
IncludeCategories:
  - Regex:           '^"' # quoted includes
    Priority:        1
  - Regex:           '^<(benchmarks|tests)/' # benchmark includes
    Priority:        2
  - Regex:           '^<cudf_test/' # cuDF includes
    Priority:        3
  - Regex:           '^<cudf/' # cuDF includes
    Priority:        4
  - Regex:           '^<(nvtext|cudf_kafka)' # other libcudf includes
    Priority:        5
  - Regex:           '^<(cugraph|cuml|cuspatial|raft|kvikio)' # Other RAPIDS includes
    Priority:        6
  - Regex:           '^<rmm/' # RMM includes
    Priority:        7
  - Regex:           '^<(thrust|cub|cuda)/' # CCCL includes
    Priority:        8
  - Regex:           '^<(cooperative_groups|cuco|cuda.h|cuda_runtime|device_types|math_constants|nvtx3)' # CUDA includes
    Priority:        8
  - Regex:           '^<.*\..*' # other system includes (e.g. with a '.')
    Priority:        9
  - Regex:           '^<[^.]+' # STL includes (no '.')
    Priority:        10
```

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)

URL: #15063
@harrism
Copy link
Member Author

harrism commented Feb 22, 2024

/merge

@rapids-bot rapids-bot bot merged commit 51f0439 into rapidsai:branch-24.04 Feb 22, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp Pertains to C++ code feature request New feature or request non-breaking Non-breaking change Python Related to RMM Python API
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Automate C++ include file grouping and ordering using clang-format
4 participants