Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the dice metric to UMAP #5140

Open
wants to merge 10,000 commits into
base: branch-23.02
Choose a base branch
from

Conversation

beckernick
Copy link
Member

This PR:

  • Adds support for metric="dice" to UMAP
  • When using this metric, UMAP expects Boolean inputs

Closes #5129

cjnolet and others added 30 commits March 24, 2022 02:38
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Depends on rapidsai/raft#583 and rapidsai/cumlprims_mg#86

This PR includes a few fixes to support source-only builds:

1. Use `CPMFindPackage` to get `cumlprims_mg`
  a. This enables building against a local `cumlprims_mg` build (`-D cumlprims_mg_ROOT=/cumlprims_mg/cpp/build`)
2. Explicitly call `find_package(Threads)` to be more resilient to RAFT changes

Authors:
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Robert Maynard (https://github.com/robertmaynard)

URL: rapidsai#4649
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Goals:
- Python documentation should be built for PRs 
- Checks should fail upon sphinx warnings
- Existing warnings are fixed in this PR

This PR covers only the python documentation. C++ docs are out of scope.

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4585
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Continuation of rapidsai#2975 to change docstring to a doctest format.
I'm also adding a pytest file to execute doctest, similar to what's being done in [cudf](https://github.com/rapidsai/cudf/blob/branch-22.04/python/cudf/cudf/tests/test_doctests.py)

Authors:
  - Micka (https://github.com/lowener)
  - Yuqiong Li (https://github.com/yuqli)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4618
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
… single GPU (rapidsai#4645)

Fixes single GPU build by dividing decomposition pxd headers so that cumlprims files are not included in single GPU cython files.

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#4645
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Replace `cudf.logical_not` with `~`.

Authors:
  - Andy Adinets (https://github.com/canonizer)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4669
- removed default `T = float` from multi-sum
- tests with float64 for multi-sum and `child_index()`
- refactored multi-sum tests to reduce shared memory usage
  - this is needed for the tests with float64 to compile

This is pull request 1 of 3 to integrate rapidsai#4646. This pull request is partly based on the work by @levsnv.

Authors:
  - Andy Adinets (https://github.com/canonizer)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4648
This small PR :
* Rectifies/clarifies the wrong parameter description for `max_depth` parameter in `RandomForestRegressor` and `RandomForestClassifier` in RF and dask-RF. 
* Corrects a syntax error in dask-rf docs
* Corrects the `max_depth` check in RF-python layer

resolves rapidsai#4623

Authors:
  - Venkat (https://github.com/venkywonka)

Approvers:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4666
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
We don't have arm64 packages for `rapids-doc-env`, so only run the doc builds for x86_64

Authors:
   - Ray Douglass (https://github.com/raydouglass)

Approvers:
   - AJ Schmidt (https://github.com/ajschmidt8)
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Templatized functions related to FIL inference in preparation of `float64` support.

Instantiations of templates with `float64`, or tests for `float64`, _are not included_; they will be provided in a future pull request.

This is pull request 2 of 3 to integrate rapidsai#4646. This pull request is partly based on the work by @levsnv.

Authors:
  - Andy Adinets (https://github.com/canonizer)
  - Levs Dolgovs (https://github.com/levsnv)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4655
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Add a `libcuml-tests` package to the `libcuml` recipe:
- This is a prerequisite for removing "Project Flash" from our build/CI scripts.
- The `libcuml-tests` package was added as an additional output to the existing `libcuml` recipe (which was renamed to `libcuml-split`)

Authors:
  - Jordan Jacobelli (https://github.com/Ethyling)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4635
`float64` support in FIL core and tests.

Added `float64`-based instantiations of FIL types and functions, together with `float64`-based FIL tests.

Treelite and Python layers _are not covered_ here, but will be in separate pull requests.

This is pull request 3 of 3 to enable `float64` support in FIL core. This is partly based on the work by @levsnv.

Authors:
  - Andy Adinets (https://github.com/canonizer)
  - Levs Dolgovs (https://github.com/levsnv)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#4646
[REVIEW] Unpin `dask` & `distributed` for development
This PR fixes a minor typo in the cuML intro documentation.

Authors:
  - Christopher Akiki (https://github.com/cakiki)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4695
CMake 3.23.1 contains the bug fixes that we need to make use of CMake 3.23, so now we can update the pinnings to just avoid 3.23.0.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)
  - Robert Maynard (https://github.com/robertmaynard)

URL: rapidsai#4698
`float64` support in treelite->FIL import and Python layer

Authors:
  - Andy Adinets (https://github.com/canonizer)
  - Levs Dolgovs (https://github.com/levsnv)

Approvers:
  - Philip Hyunsu Cho (https://github.com/hcho3)
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4690
ajschmidt8 and others added 14 commits January 2, 2023 14:37
Due to some limitations on the number of reusable workflows that can be used in a given GH Action workflow, I need to switch the nightly workflow files (`build.yaml` and `test.yaml`) to use `workflow_dispatch` instead of `workflow_call`.

Because of this switch, we can also remove the hardcoded `repo` field in `{build,test}.yaml`.
…5108)

This is a small cleanup to remove the outdated `PROJECT_FLASH` environment variable from the libcuml recipe.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5108
This PR adds a missing dependency for the `pr-builder` job in the `pr.yaml` workflow.

Authors:
   - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
   - AJ Schmidt (https://github.com/ajschmidt8)
rapids-cmake 23.02 is deprecating the magic value of `ALL` since it doesn't cleanly map to the cmake magic value of `all`. Instead we use `RAPIDS` which better represents the architectures we are building for.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5124
…5125)

Small cleanup to remove `MACOSX_DEPLOYMENT_TARGET` references from a build script (macOS is not supported by RAPIDS).

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#5125
It looks like there are still some attributes with `ptr` in their name that are being serialized with the hdbscan object and that's causing some segfaults when serialization is performed on a separate process. 

Closes rapidsai#4986

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5128
This PR adds pip wheel CI to the Conda CI, instead of having them work separately.

Authors:
  - Sevag H (https://github.com/sevagh)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5109
@beckernick beckernick self-assigned this Jan 19, 2023
@beckernick beckernick requested a review from a team as a code owner January 19, 2023 21:00
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jan 19, 2023
@beckernick beckernick added feature request New feature or request non-breaking Non-breaking change labels Jan 19, 2023
@@ -572,12 +572,12 @@ def test_fuzzy_simplicial_set(n_rows,
@pytest.mark.parametrize('metric', ['l2', 'euclidean', 'sqeuclidean', 'l1',
'manhattan', 'minkowski', 'chebyshev',
'cosine', 'correlation', 'jaccard',
'hamming', 'canberra'])
'hamming', 'canberra', 'dice'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see if CI passes on this but I don't believe our dense pairwise distance API is currently able to compute Dice distance. This is a really trivial computation to do w/ any primitive that's capable of computing euclidean, though. It just requires the dot product and l1 norms:

value_t q_r_union = q_norm + r_norm;
value_t dice      = (2 * dot) / q_r_union;
bool both_empty   = q_r_union == 0;
return 1 - ((!both_empty * dice) + both_empty);

Let me know if this is something we should prioritize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passed for me locally before I put this PR up, but your point is well taken as the dense pairwise distance API doesn't support this currently. The existing machinery seems to handle some but not all of the behavior.

#5129 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly still scratching my head at how this could be working w/ dense inputs, which is making me wonder if we're getting a false positive in the passing test. The FAISS brute-force API, which we're still using at the present time for non-L2 (and non-haversine) distances, doesn't support dice distance at all.

@beckernick
Copy link
Member Author

Sounds like we should table this PR until we better understand what's happening. Unless others disagree, will close this PR.

@csadorf
Copy link
Contributor

csadorf commented Feb 6, 2023

@cjnolet I suggest we schedule this for 23.04.

@bdice
Copy link
Contributor

bdice commented Dec 11, 2024

@dantegd @beckernick Can we close this in favor of #5943?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support for Dice coefficient as a metric in UMAP