-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate include grouping order in .clang-format #14993
Automate include grouping order in .clang-format #14993
Conversation
… using new grouping rules.
Note that these changes did turn up two places where needed headers were not included. |
…rapidsai#14868) Changes some internal offset arrays used for managing temporary unicode code-points to int64 type. This effects the nvtext normalize and subword-tokenizer functions. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#14868
This PR ensures that all calls to `ctest` include the flag `--no-tests=error`. See rapidsai/build-planning#18. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: rapidsai#14983
…ai#14989) This PR filters all `DeprecationWarning`'s that are being originated by `ArrowTable.to_pandas` Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#14989
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing comments for now. I'll look further at this later.
…idsai#14995) This PR fixes `DataFrame.sort_index` to properly ignore indexes for all values of `axis`. This is fixed in pandas-2.2, hence xfailing the tests with a version check. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#14995
Deprecated in pandas 2.2 pandas-dev/pandas#55719 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#15000
Seems something here makes the jitify code-paths fail. I know jitify has issues with certain include files. |
…apidsai#14999) This only really affected `to_hdf` Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#14999
Adds offsetalator in place of hardcoded offset type arrays to the strings split functions: - `cudf::strings::split()` - `cudf::strings::rsplit()` - `cudf::strings::split_record()` - `cudf::strings::rsplit_record()` - `cudf::strings::split_re()` - `cudf::strings::rsplit_re()` - `cudf::strings::split_record_re()` - `cudf::strings::rsplit_record_re()` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Mike Wilson (https://github.com/hyperbolic2346) URL: rapidsai#14757
For comparison: pandas-dev/pandas#55856 pandas-dev/pandas#55895 pandas-dev/pandas#55499 The `errors="ignore"` parameter is the only one that is implemented so just added a test for that deprecation Authors: - Matthew Roeschke (https://github.com/mroeschke) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#14984
…#14998) A small bug in our previous implementation leads to a segfault when calling `.get_groups()` with no `values`. Thankfully, the cuDF Python API always calls this function with a value, but it's possible `pylibcudf` consumers will not. Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#14998
I am very much in favour of this. I like the groupings. One minor question around the colour of the bikeshed: Do we want to (probably not in this change due to the churn it would cause) have separate groups for the public ( |
Updates `cudf::strings::wrap()` to use the offsetalator instead of hardcoded int32 type for offsets column data. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#15002
…i#14956) Resolves [10219](NVIDIA/spark-rapids#10219) This PR introduces a new class named `GetJsonObjectOptions` that holds the configurations to control the behavior of the underlying `cudf::get_json_object` function. It incorporates this new class into the `getJSONObject` JAVA API as an additional argument but also keeps the previous API to maintain backwards compatibility. It also includes a test case, `testGetJSONObjectWithSingleQuotes`, validating the behavior of `getJSONObject` when single quotes are enabled. Authors: - Suraj Aralihalli (https://github.com/SurajAralihalli) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - MithunR (https://github.com/mythrocks) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#14956
Forward-merge branch-24.02 to branch-24.04
I think unpinning `numba` in the conda recipe was just missed in rapidsai#14616. I discovered this issue [trying to build the `24.02` release](https://github.com/rapidsai/cudf/actions/runs/7878153691/job/21496377912#step:7:1674). PRs & nightly builds are working because the `rapidsai-nightly` channel has an older version of `pynvjitlink` that supported `numba>=0.57` whereas the `rapidsai` channel only has the latest version which pins to `numba>=0.58`. Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice)
…apidsai#15018) Found while working on large strings where copy-if is called. In places where `copy_if_safe` utility is called the non-stencil overload calls the stencil-ed function by forwarding the `first` iterator as the `stencil` parameter. This works logically because both values will return the same result. Unfortunately, this can be a performance issue if the iterator is complex/slow transform iterator since it would be called twice (an inlined twice). Changing the non-stencil version to call `thrust::copy_if` directly fixes the potential issue. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) - Mike Wilson (https://github.com/hyperbolic2346) URL: rapidsai#15018
Forward-merge branch-24.02 to branch-24.04
Toward pandas 2.2 compat: Deprecated in pandas in pandas-dev/pandas#56557 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: rapidsai#14986
Updates `cudf::get_json_object()` to use the offsetalator to build the output strings column. It adds a sizes vector to hold the output row lengths which is then converted to offsets using the new `make_offsets_child_column()` utitlity. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Vukasin Milovanovic (https://github.com/vuule) URL: rapidsai#15009
Updates `cudf::interleave_columns()` to use the new `make_offsets_child_column` utility and the offsetalator to build the output strings column. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: rapidsai#15004
Closes rapidsai#14991 Authors: - Sanjana Gajendran (https://github.com/sanjana098) Approvers: - Bradley Dice (https://github.com/bdice) - Mark Harris (https://github.com/harrism) URL: rapidsai#15025
…#15052) Developers expect that 'cleaning' a build directory will remove all forms of cached files ( objects, libraries, jit cache, etc ). To ensure that happens consistenly we also need to remove the jitify cache objects for cudf. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) - Mark Harris (https://github.com/harrism) URL: rapidsai#15052
Reworks the `cudf::strings::detail::copy_if_else()` to improve performance for long strings. The rework builds a vector of rows to pass to the `make_strings_column` factory that uses the optimized `gather_chars` function. Also includes a benchmark for copy_if_else specifically for strings columns. Closes rapidsai#15014 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Vukasin Milovanovic (https://github.com/vuule) URL: rapidsai#15017
Closes rapidsai#14495 Adds support for reading and writing ORC and Parquet files with LZ4 compression. Also adds the new value to the Python API. Included basic C++ and Python tests so that the option is exercised in CI. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Bradley Dice (https://github.com/bdice) - Shruti Shivakumar (https://github.com/shrshi) - MithunR (https://github.com/mythrocks) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#14906
…m/cudf into fea/clang-format-include-grouping
The commit history on this PR has gotten out of hand somehow. I think it's best to start a new PR with the latest agreed-upon .clang-format changes and then rerun the pre-commit hooks and apply any other applicable reviewer suggestions from this PR. This way I will minimize changes and have a better change of figuring out any CI failures. |
Closing in favour of #15063 |
Description
This uses the
IncludeCategories
settings in .clang-format to attempt to enforce our documented#include
order in libcudf. See https://docs.rapids.ai/api/libcudf/stable/developer_guideI realize that there was a previous attempt at this by @bdice that met with some resistance. Reading it, I wouldn't say it was vetoed; rather, reviewers requested something much simpler. I have a few reasons to attempt this again.
rmm::mr::device_memory_resource*
withrmm::device_async_resource-ref
everywhere in RAPIDS (not just cuDF). This requires adding an include to MANY files. Getting the location of the include right everywhere is very difficult without automatic grouping of headers. I started out writing a bash script to do this before realizing clang-format has the necessary feature. And I realized that my script would never properly handle files like this.Note that one of the ways that having few categories can work while still maintaining clear groups is that this PR updates many files to use quotes ("") instead of angle brackets (<>) for local cuDF headers that do not live in
cudf/cpp/include
. With our "near to far" include ordering policy, these are arguably the nearest files, and using quotes allows us to have our first category simply check for quotes. These files will be grouped and sorted without blank lines, but in practice this does not lose clarity because typically headers from more than two directories are not included from the same file. The downside of this change is I don't yet know how to automatically enforce it. I hope that when developers accidentally use <> for internal includes that don't start with (e.g.) "cudf", they will be grouped one of the lowest priority categories, and perhaps this will induce them to switch to "" to get the headers listed at the top. The rule is simple: if it's in libcudf but not incpp/include/cudf
, then use quotes. For everything else, use angle brackets.Other than headers from RAPIDS repos, I group all other headers that have a file extension in a single group, and all files that have no file extension in another group. Since the latter also matches includes some files from libcu++, I have an explicit category for
<cuda/
includes to keep them separate from STL includes. A frequent effect of the single "." group is that cub and thrust headers get grouped without a blank line between them. I don't think this is a problem.Below I'm listing the (fairly simple, in my opinion) .clang-format settings for this PR. Note that categories 2-5 will require tweaking for different RAPIDS repos.
Some may ask why I ordered
cudf_test
headers beforecudf
headers. I tried both orders, and puttingcudf_test
first generated significantly fewer changes in the PR, meaning that it's already the more common ordering (I supposecudf_test
is closer to the files that include it, since they are libcudf tests).I've opened a similar PR for RMM with only 5 groups. rapidsai/rmm#1463
CC @davidwendt @vyasr @wence- @GregoryKimball for feedback
@isVoid contributed to this PR via pair programming.
Checklist