Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Plumb edit_distance_matrix to Python #6341

Closed
beckernick opened this issue Sep 28, 2020 · 4 comments · Fixed by #8463
Closed

[FEA] Plumb edit_distance_matrix to Python #6341

beckernick opened this issue Sep 28, 2020 · 4 comments · Fixed by #8463
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@beckernick
Copy link
Member

beckernick commented Sep 28, 2020

We should plumb edit_distance_matrix to the Cython layer. This could potentially live in the cudf._lib.nvtext.edit_distance namespace alongside the canonical edit_distance.

For now, I'm not sure we if need to plumb this all the way up to the Python layer. See #4569 (comment) in which we decided similarly for find_multiple based on efficiency and utility. With list column support we could plumb it all the way up if desired.

It would be useful if we could analogously do for edit_distance_matrix what we can currently do for find_multiple.

from cudf._lib.strings import find_multiple
import cudf
import cupy as cp

a = cudf.Series(["rabbit","dog", "bird"])
b = cudf.Series(["a","b"])

out = find_multiple.find_multiple(a._column, b._column) # NumericalColumn output (contiguous buffer)
out = cp.asarray(out).reshape(-1, len(b)) # reshape to "actual" 2-d matrix representation
out
array([[ 1,  2],
       [-1, -1],
       [-1,  0]], dtype=int32)
@beckernick beckernick added feature request New feature or request Cython labels Sep 28, 2020
@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@beckernick
Copy link
Member Author

This is still necessary, and likely should be plumbed up to the Python layer given list column support. Updating the title accordingly.

@beckernick beckernick changed the title [FEA] Plumb edit_distance_matrix to Cython [FEA] Plumb edit_distance_matrix to Python Feb 17, 2021
@davidwendt
Copy link
Contributor

From libcudf, it looks like nvtext::edit_distance_matrix returns a lists column whereas cudf::strings::find_multiple returns a single array. It would not be difficult to change find_multiple return a lists column if that is helpful.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@harrism harrism added the Python Affects Python cuDF API. label May 27, 2021
@isVoid isVoid self-assigned this Jun 8, 2021
rapids-bot bot pushed a commit that referenced this issue Jun 10, 2021
This PR plumbs nvtext's `edit_distance_matrix` to cudf python with necessary precondition checks. It also adds python tests.

Closes #6341

Authors:
  - Michael Wang (https://github.com/isVoid)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Marlene  (https://github.com/marlenezw)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #8463
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants