Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a priority queue for the sparse secondary iteration #49

Closed
LTLA opened this issue May 10, 2023 · 2 comments
Closed

Use a priority queue for the sparse secondary iteration #49

LTLA opened this issue May 10, 2023 · 2 comments

Comments

@LTLA
Copy link
Member

LTLA commented May 10, 2023

This can replace the current_indices store, and means that each secondary iteration doesn't have to scan across the entire extent. Only should be filled if an oracle is provided and the next element is greater than the current element.

Probably will need another field to specify the last element at which the queue was updated. Might even have to use make_heap directly to allow us to clear the priority queue quickly when a reverse order is used.

@LTLA
Copy link
Member Author

LTLA commented May 18, 2023

This is... much more complex than I thought, especially to handle both increments and decrements.

The real problem is that I'm not even sure it's going to be faster.

  • The heap takes Nz * log(Nz) time to depopulate and fill, given Nz non-zero elements in a row.
  • If we do any jumps, we need another Nz * log(Nz) to sort the non-zeros as they don't come out of the heap in order.
  • If we do a long jump, then the whole thing collapses to N * log(N) anyway, as the heap information is useless.

If Nz isn't much smaller than N (the number of columns), then we're going to see a performance degradation.

Besides, pure iteration is pretty fast if there's no action involved other than to press continue. This is especially true for the current_indices vector that can be iterated contiguously in memory, and even more so if the branch predictor guesses that action is generally not required for very sparse rows.

So it might be faster to just proceed as we are doing now, with one simple compromise; we can keep track of the next-closest index to the current position, allowing us to quickly skip all-zero rows.

@LTLA
Copy link
Member Author

LTLA commented May 21, 2023

Closed by #51.

@LTLA LTLA closed this as completed May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant