[FEA] Expose UMAP embedding graph #4292

lmeyerov · 2021-10-18T18:26:03Z

Is your feature request related to a problem? Please describe.
We would like access to umap's computed graph for downstream tasks like visualization and other ML methods

See additional use cases discussed in #4228

Describe the solution you'd like

When computing an embedding, have an option to also expose the weighted graph / cover tree, such as via a numpy sparse matrix (how umap_learn does it) or a cugraph weighted graph

Ex:

embedding = umap.UMAP(n_components=2, emit_graph=True).fit(df)
coo = embedding.graph_.tocoo()
edges_df = pd.DataFrame({'src': coo.row, 'dst': coo.col, 'weight': coo.data})

Describe alternatives you've considered

umap_learn has umap.UMAP(transform_mode='graph', ...), except using that might mean having to call umap() twice. An explicit flag to expose the graph as part of the output may be more in line with expected use.

Currently, when using umap_learn, we do the above. When using cuML, we manually run knn to try to infer the graph from the embedding, but that's awkward and less accurate.

@cjnolet @AjayThorve @trxcllnt

The text was updated successfully, but these errors were encountered:

github-actions · 2021-11-23T20:03:00Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

lmeyerov · 2021-11-23T22:57:12Z

FWIW, this is of continued interest for use cases around security, fraud, genomics, and visualizing embeddings in general. We are discussing w/ the umap team on explainable AI approaches that build on this. Meanwhile, we're doing a k-nn edge-recovery dance to work around for RAPIDS flavors.

github-actions · 2021-12-23T23:02:58Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

lmeyerov · 2021-12-23T23:06:13Z

This is of increasing relevance fwiw :)

github-actions · 2022-01-23T01:24:27Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

lmeyerov · 2022-01-24T19:55:27Z

We are still interested in this. We're adding some autoumap bits to pygraphistry, and have to workaround when users switch from umap_learn to cuml.umap...

github-actions · 2022-02-24T08:03:19Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

lmeyerov · 2022-02-24T08:41:30Z

Still of interest :) We are getting ready for the cuml engine variant of our new integrated auto-umap branch (graphistry/pygraphistry#305), and if helpful, I can point to some Nvidia+RAPIDS partners who will be using

cc @taureandyernv

cjnolet · 2022-02-24T11:40:54Z

@lmeyerov this shouldn't be too hard to do. Just to clarify- what you want is the fuzzy simplicial set graph here?

lmeyerov · 2022-02-24T16:20:11Z

Yep -- with priority for the one in the original space, not the embedding, and with the normalized/undirected weights.

Our intuition is the original is, for explainability, the original space's weighted 1-simplexes are already interpretable and more precise. Likewise, enables graph layout with the same initial seed. Getting the embedding's simplex is interesting too, mostly for enabling us to highlight which 1-simplexes were added vs lost.. but that's priority 2.

lmeyerov · 2022-04-20T18:25:11Z

ping :) we're about to release graphistry.nodes(cudf.read_csv('genomes.csv')).umap(engine='umap_learn').plot(), and would love to do graphistry.nodes(cudf.read_csv('genomes.csv')).umap(engine='cuml').plot() as well :)

taureandyernv · 2022-05-09T21:11:17Z

@dantegd @cjnolet will we be able to get this into 22.06?

cjnolet · 2022-05-10T14:16:02Z

@taureandyernv we are going to try and aim for 22.06. There's a PR open (#4711) to expose the simplicial set functions, which required the need to expose sparse objects in Python which were populated by the c++ layer (e.g. the number of nonzeros isn't known ahead of time) which should make exposing the connectivites graph from a trained model even easier.

This PR closes issues #3123, #4704 and #4292 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: #4711

cjnolet · 2022-06-06T14:26:02Z

@lmeyerov, https://github.com/rapidsai/cuml/pull/4756/files added the attribute model.graph_ so I'm going to close this as done. Please feel free to re-open if this still doesn't satisfy the requirement.

lmeyerov · 2022-06-06T15:30:09Z

Excellent, is this for the 22.06 release (and in nightly's already)?

cjnolet · 2022-06-06T15:52:32Z

@lmeyerov, yep, the feature made it into 22.06 (and the nightlies).

This PR closes issues rapidsai#3123, rapidsai#4704 and rapidsai#4292 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4711

lmeyerov added ? - Needs Triage Need team to review and classify feature request New feature or request labels Oct 18, 2021

github-actions bot added the inactive-30d label Nov 23, 2021

github-actions bot removed the inactive-30d label Nov 23, 2021

github-actions bot added the inactive-30d label Dec 23, 2021

github-actions bot removed the inactive-30d label Dec 24, 2021

github-actions bot added the inactive-30d label Jan 23, 2022

github-actions bot removed the inactive-30d label Jan 24, 2022

github-actions bot added the inactive-30d label Feb 24, 2022

github-actions bot removed the inactive-30d label Feb 24, 2022

cjnolet self-assigned this Feb 24, 2022

viclafargue mentioned this issue May 11, 2022

Expose simplicial set functions #4711

Merged

cjnolet closed this as completed Jun 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Expose UMAP embedding graph #4292

[FEA] Expose UMAP embedding graph #4292

lmeyerov commented Oct 18, 2021 •

edited

Loading

github-actions bot commented Nov 23, 2021

lmeyerov commented Nov 23, 2021

github-actions bot commented Dec 23, 2021

lmeyerov commented Dec 23, 2021

github-actions bot commented Jan 23, 2022

lmeyerov commented Jan 24, 2022

github-actions bot commented Feb 24, 2022

lmeyerov commented Feb 24, 2022

cjnolet commented Feb 24, 2022

lmeyerov commented Feb 24, 2022 •

edited

Loading

lmeyerov commented Apr 20, 2022

taureandyernv commented May 9, 2022

cjnolet commented May 10, 2022

cjnolet commented Jun 6, 2022

lmeyerov commented Jun 6, 2022 •

edited

Loading

cjnolet commented Jun 6, 2022

[FEA] Expose UMAP embedding graph #4292

[FEA] Expose UMAP embedding graph #4292

Comments

lmeyerov commented Oct 18, 2021 • edited Loading

github-actions bot commented Nov 23, 2021

lmeyerov commented Nov 23, 2021

github-actions bot commented Dec 23, 2021

lmeyerov commented Dec 23, 2021

github-actions bot commented Jan 23, 2022

lmeyerov commented Jan 24, 2022

github-actions bot commented Feb 24, 2022

lmeyerov commented Feb 24, 2022

cjnolet commented Feb 24, 2022

lmeyerov commented Feb 24, 2022 • edited Loading

lmeyerov commented Apr 20, 2022

taureandyernv commented May 9, 2022

cjnolet commented May 10, 2022

cjnolet commented Jun 6, 2022

lmeyerov commented Jun 6, 2022 • edited Loading

cjnolet commented Jun 6, 2022

lmeyerov commented Oct 18, 2021 •

edited

Loading

lmeyerov commented Feb 24, 2022 •

edited

Loading

lmeyerov commented Jun 6, 2022 •

edited

Loading