Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC]: CUB large input support #50

Open
8 of 24 tasks
jrhemstad opened this issue Apr 21, 2023 · 1 comment
Open
8 of 24 tasks

[EPIC]: CUB large input support #50

jrhemstad opened this issue Apr 21, 2023 · 1 comment
Assignees
Labels
2.8.0 target for 2.8.0 release cub For all items related to CUB feature request New feature or request.

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented Apr 21, 2023

As a lower-level interface, CUB should optimize for flexibility and performance. As a result, CUB will not guarantee a large input will work by default. However, it should enable users to specify their desired offset type.

This means CUB should not perform any dynamic dispatch based on the input size. Instead, users should have a way to statically specify the offset type. In previous discussion we favored making the type of num_items a template and infer the offset type from the type of num_items.

Design-related research

Testing large number of items

Enable large num_items in CUB algorithms that are sensitive to the choice of offset_t

Limit the number of kernel template instantiations by reducing the set of offset types

Clean up interim testing infrastructure

  • Switch tests for large number of items to use Device* interface in DeviceSelect
  • Switch tests for large number of items to use Device* interface in DeviceScan

Documentation

@jrhemstad jrhemstad changed the title Determine and finalize design for large input support in CUB CUB large input support Apr 21, 2023
@miscco miscco added feature request New feature or request. cub For all items related to CUB labels Jul 12, 2023
@miscco miscco changed the title CUB large input support [FEA]: CUB large input support Jul 12, 2023
@github-project-automation github-project-automation bot moved this to Todo in CCCL Jul 12, 2023
@elstehle
Copy link
Collaborator

elstehle commented Feb 21, 2024

legend for offset type: ✅ considered done | 🟡 considered lower priority | 🟠 considered higher priority, as it prevents usage for larger-than-INT_MAX number of items | ⏳ in progress

legend for testing columns: ✅ considered done | 🟡 to be done | 🟠 needs to support wider offset types first

algorithm offset type tests larger-thanINT_MAX tests close to [U]INT_MAX
device_adjacent_difference.cuh choose_offset_t ✅ 2^33, sanity check, iterators 🟡
device_radix_sort.cuh choose_offset_t ✅ extensive check ✅ extensive check
device_reduce.cuh choose_offset_t: Reduce, Sum, Min, Max, ReduceByKey, TransformReduce
✅ streaming: ArgMin, ArgMax (#2647) ⚠️ (note: new interface only)
✅ sanity, 2^{30,31,33) ✅ sanity, 2^32-1
device_scan.cuh ✅ choose_offset_t: DeviceScan
✅ choose_offset_t: DeviceScanByKey
device_select.cuh choose_offset_t: UniqueByKey
streaming (for any user-provided offset type): Flagged, If, Unique
device_partition.cuh streaming (for larger-than-int user-provided offset types): Flagged, If
streaming (for larger-than-int user-provided offset types): ThreeWayPartition #2506


device_segmented_sort.cuh num_segments: supporting up to int64_t segments, chunked into INT_MAX number of segments per invocation (#3308)
num_items: supporting up to int64_t items per segment
device_merge_sort.cuh ✅ choose_offset_t (#3328) ✅ extensive check ✅ extensive check
device_transform.cuh choose_signed_offset_t (#3172)
device_copy.cuh 🟡num_ranges: uint32_t
🟡buffer sizes: iterator_traits<SizeIteratorT>::value_type
🟡 🟡
device_memcpy.cuh 🟡 num_ranges: uint32_t
🟡 buffer sizes: iterator_traits<SizeIteratorT>::value_type
🟡 🟡
device_for.cuh 🟡NumItemsT: ForEachN, ForEachCopyN, Bulk
difference_type: ForEach, ForEachCopy
🟡 🟡
device_histogram.cuh 🟡 dynamic dispatch: int for (num_rows * row_stride_bytes)<INT_MAX;
OffsetT otherwise
🟡 🟡
device_segmented_reduce.cuh 🟡 common_iterator_value_t({Begin,End}OffsetIteratorT): Reduce, Sum, Min, Max
⚠️ (note) int: ArgMin, ArgMax
num_segments: int
✅ sanity, rnd [2^31; 2^33] 🟡
device_run_length_encode.cuh 🟠 int 🟠 🟠
device_segmented_radix_sort.cuh num_segments: supporting up to int64_t segments
segment_offsets & num_items: supporting up to int64_t (#3402)
🟠 "segment_size": any single segment size may not exceed INT_MAX
device_merge.cuh 🟠 int 🟡 🟡

@elstehle elstehle mentioned this issue May 21, 2024
13 tasks
@jollylili jollylili added the 2.8.0 target for 2.8.0 release label Nov 15, 2024
@elstehle elstehle self-assigned this Dec 2, 2024
@elstehle elstehle changed the title [FEA]: CUB large input support [EPIC]: CUB large input support Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.8.0 target for 2.8.0 release cub For all items related to CUB feature request New feature or request.
Projects
Status: Todo
Development

No branches or pull requests

4 participants