Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] doc whether renumber preserves edge order #922

Closed
lmeyerov opened this issue Jun 1, 2020 · 4 comments · Fixed by #963
Closed

[ENH] doc whether renumber preserves edge order #922

lmeyerov opened this issue Jun 1, 2020 · 4 comments · Fixed by #963
Assignees
Labels
doc Documentation
Milestone

Comments

@lmeyerov
Copy link
Contributor

lmeyerov commented Jun 1, 2020

Describe the solution you'd like
Docs clarify whether renumbering preserves edge orders.

(Ideally they do, because this simplifies then plugging in different edge weights without manually swizzling.)

Additional context
Encountered in various renumber calls in 0.13, e.g.:
https://docs.rapids.ai/api/cugraph/stable/api.html?highlight=renumber#cugraph.structure.renumber.renumber_from_cudf

@lmeyerov lmeyerov added the ? - Needs Triage Need team to review and classify label Jun 1, 2020
@BradReesWork BradReesWork self-assigned this Jun 16, 2020
@BradReesWork BradReesWork added doc Documentation and removed doc Documentation labels Jun 16, 2020
@BradReesWork BradReesWork added this to the 0.15 milestone Jun 16, 2020
@BradReesWork BradReesWork added doc Documentation and removed ? - Needs Triage Need team to review and classify labels Jun 16, 2020
@ChuckHastings
Copy link
Collaborator

FYI - it does preserve the ordering.

I am updating this code in 0.15, I will add this to the documentation.

@ChuckHastings
Copy link
Collaborator

Let me change my answer.

In version 0.14 and prior single column integer renumbering preserved ordering. Single column non-integer and Multi column renumbering did not guarantee that the order was preserved.

The new renumbering in 0.15 does not guarantee to preserve ordering of the input. This is documented in the new python API methods.

If ordering matters, you can add an extra column to the input Dataframe numbering the rows. The output of the 0.15 numbering is always a data frame and preserves the contents of the original Dataframe that aren't part of the source or destination definitions. You can then use sort_values on the extra column you added to reconstruct the original order.

I considered making that part of the implementation, but sorting is expensive and in most cases unnecessary.

@lmeyerov
Copy link
Contributor Author

Maybe make sorted=False a parameter?

cc @kkraus14 as you felt the pain of tickets stemming from similar design decisions here in cudf

My reasoning:

  • A lot of usage will be sorted, so make typical cases easy: have df, use cugraph to enrich, get back df, continue
  • By making an arg, makes it way easier to optimize later without breaking user code
  • I'd prefer a default of sorted=True -- let people go faster via opt-in, but safe by default, as what's annoying-level for us is horrible for others. but i'm not spending my request points on this :)

@ChuckHastings
Copy link
Collaborator

Makes sense. I will evaluate the mechanics of doing this and update the PR accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants