Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Expose UMAP embedding graph #4292

Closed
lmeyerov opened this issue Oct 18, 2021 · 16 comments
Closed

[FEA] Expose UMAP embedding graph #4292

lmeyerov opened this issue Oct 18, 2021 · 16 comments
Assignees
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@lmeyerov
Copy link

lmeyerov commented Oct 18, 2021

Is your feature request related to a problem? Please describe.
We would like access to umap's computed graph for downstream tasks like visualization and other ML methods

See additional use cases discussed in #4228

Describe the solution you'd like

When computing an embedding, have an option to also expose the weighted graph / cover tree, such as via a numpy sparse matrix (how umap_learn does it) or a cugraph weighted graph

Ex:

embedding = umap.UMAP(n_components=2, emit_graph=True).fit(df)
coo = embedding.graph_.tocoo()
edges_df = pd.DataFrame({'src': coo.row, 'dst': coo.col, 'weight': coo.data})

Describe alternatives you've considered

umap_learn has umap.UMAP(transform_mode='graph', ...), except using that might mean having to call umap() twice. An explicit flag to expose the graph as part of the output may be more in line with expected use.

Currently, when using umap_learn, we do the above. When using cuML, we manually run knn to try to infer the graph from the embedding, but that's awkward and less accurate.

@lmeyerov lmeyerov added ? - Needs Triage Need team to review and classify feature request New feature or request labels Oct 18, 2021
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@lmeyerov
Copy link
Author

FWIW, this is of continued interest for use cases around security, fraud, genomics, and visualizing embeddings in general. We are discussing w/ the umap team on explainable AI approaches that build on this. Meanwhile, we're doing a k-nn edge-recovery dance to work around for RAPIDS flavors.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@lmeyerov
Copy link
Author

This is of increasing relevance fwiw :)

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@lmeyerov
Copy link
Author

We are still interested in this. We're adding some autoumap bits to pygraphistry, and have to workaround when users switch from umap_learn to cuml.umap...

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@lmeyerov
Copy link
Author

Still of interest :) We are getting ready for the cuml engine variant of our new integrated auto-umap branch (graphistry/pygraphistry#305), and if helpful, I can point to some Nvidia+RAPIDS partners who will be using

cc @taureandyernv

@cjnolet cjnolet self-assigned this Feb 24, 2022
@cjnolet
Copy link
Member

cjnolet commented Feb 24, 2022

@lmeyerov this shouldn't be too hard to do. Just to clarify- what you want is the fuzzy simplicial set graph here?

@lmeyerov
Copy link
Author

lmeyerov commented Feb 24, 2022

Yep -- with priority for the one in the original space, not the embedding, and with the normalized/undirected weights.

Our intuition is the original is, for explainability, the original space's weighted 1-simplexes are already interpretable and more precise. Likewise, enables graph layout with the same initial seed. Getting the embedding's simplex is interesting too, mostly for enabling us to highlight which 1-simplexes were added vs lost.. but that's priority 2.

@lmeyerov
Copy link
Author

ping :) we're about to release graphistry.nodes(cudf.read_csv('genomes.csv')).umap(engine='umap_learn').plot(), and would love to do graphistry.nodes(cudf.read_csv('genomes.csv')).umap(engine='cuml').plot() as well :)

@taureandyernv
Copy link
Contributor

@dantegd @cjnolet will we be able to get this into 22.06?

@cjnolet
Copy link
Member

cjnolet commented May 10, 2022

@taureandyernv we are going to try and aim for 22.06. There's a PR open (#4711) to expose the simplicial set functions, which required the need to expose sparse objects in Python which were populated by the c++ layer (e.g. the number of nonzeros isn't known ahead of time) which should make exposing the connectivites graph from a trained model even easier.

rapids-bot bot pushed a commit that referenced this issue May 24, 2022
This PR closes issues #3123, #4704 and #4292

Authors:
  - Victor Lafargue (https://github.com/viclafargue)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #4711
@cjnolet
Copy link
Member

cjnolet commented Jun 6, 2022

@lmeyerov, https://github.com/rapidsai/cuml/pull/4756/files added the attribute model.graph_ so I'm going to close this as done. Please feel free to re-open if this still doesn't satisfy the requirement.

@cjnolet cjnolet closed this as completed Jun 6, 2022
@lmeyerov
Copy link
Author

lmeyerov commented Jun 6, 2022

Excellent, is this for the 22.06 release (and in nightly's already)?

@cjnolet
Copy link
Member

cjnolet commented Jun 6, 2022

@lmeyerov, yep, the feature made it into 22.06 (and the nightlies).

vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants