Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v21.06 #8418

Merged
merged 327 commits into from
Jun 9, 2021
Merged
Changes from 1 commit
Commits
Show all changes
327 commits
Select commit Hold shift + click to select a range
025c56a
Merge pull request #7979 from kkraus14/fix_automerge
dillon-cullinan Apr 16, 2021
dda9cb9
Replace device_vector with device_uvector in gather (#7758)
harrism Apr 16, 2021
98711da
Centralize logic for getattr->getitem override (#7845)
vyasr Apr 16, 2021
bc422fc
Enable scattering scalars into decimal columns (#7899)
brandon-b-miller Apr 16, 2021
9d2907d
auto set device in cufile jni functions (#7983)
rongou Apr 16, 2021
c05aca3
Remove `device_vector`s from parquet (#7853)
devavret Apr 17, 2021
ed68e96
Update to CPM with fix for `FETCHCONTENT_BASE_DIR` (#7982)
trxcllnt Apr 17, 2021
4da38a6
Fix union operation in `_is_supported()` (#7959)
charlesbluca Apr 17, 2021
1d03186
Remove nvstrdesc_s from cuio (#7841)
kaatish Apr 17, 2021
867d6ee
Revert "Java API change for supporting structs (#7730)" (#7987)
razajafri Apr 17, 2021
808262f
Remove char_width member from cudf::string_view class (#7914)
davidwendt Apr 19, 2021
05330dd
Update CUDA version in build scripts (#7984)
ajschmidt8 Apr 19, 2021
b8baed5
Fix ORC reader issue with bystream reader (#7988)
rgsl888prabhu Apr 19, 2021
4893259
Unpin pandas to patch version (#7992)
galipremsagar Apr 19, 2021
1775c3d
Struct binary search (lower_bound/upper_bound) (#7865)
ttnghia Apr 19, 2021
3d212b3
Defer to `iloc` when indexer is an integer-like object and the `Index…
skirui-source Apr 19, 2021
68495f9
Refactor libcudf strings concatenate to use make_strings_children (#7…
davidwendt Apr 20, 2021
2214407
Merge pull request #7999 from rapidsai/branch-0.19
GPUtester Apr 20, 2021
6fb7909
Add JNI for splitting groups in a table after groupby (#7954)
firestarman Apr 20, 2021
8e42e73
Fix incorrect explode_outer_position column null values in some cases…
hyperbolic2346 Apr 20, 2021
5c2f744
Introduces `make_optional_iterator` for nullable column and scalars (…
robertmaynard Apr 20, 2021
d501d2c
Adding support for Decimal/Fixed-point to ORC reader (#7970)
rgsl888prabhu Apr 20, 2021
f11bcd7
Remove obsolete cudf::strings::replace_nulls (#7965)
davidwendt Apr 21, 2021
c0cf5e1
Add groupby product support (#7763)
karthikeyann Apr 21, 2021
5be3a62
Update `ci/local/build.sh` default `$DOCKER_IMAGE` (#8013)
codereport Apr 21, 2021
5b71fca
Improve CSV writer tests (#7851)
vuule Apr 21, 2021
db30202
Enable quantile for decimal columns (#7927)
ChrisJar Apr 21, 2021
d4d64c0
Support setting to a new column in `DataFrame.loc` (#8012)
isVoid Apr 21, 2021
dcebfe7
Fix dask_cudf metadata-inference when first ORC path is empty (#8021)
rjzamora Apr 22, 2021
f8f9988
Use rmm::device_uvector in place of rmm::device_vector for CSV reader…
vuule Apr 22, 2021
069bf96
Fix returned column type when extracting from an empty list column (#…
jlowe Apr 22, 2021
e018722
add null order support to detail::drop_duplicates (#7938)
cwharris Apr 22, 2021
e385180
remove decimals_as_float64 from orc benchmarks (#8007)
cwharris Apr 22, 2021
151f4c5
Bump cmake minimum version to 3.18 in dev environments (#8005)
galipremsagar Apr 22, 2021
226f4bb
Merge pull request #8035 from rapidsai/branch-0.19
GPUtester Apr 22, 2021
3e3463b
Merge pull request #8038 from rapidsai/branch-0.19
GPUtester Apr 22, 2021
8dae31c
Add groupby scan aggregation to cudf (#7759)
karthikeyann Apr 22, 2021
484ce75
Use segmented_sort_by_key for sorting smaller lists (#7973)
shwina Apr 22, 2021
f18af7f
Remove references to deprecated "rmm/thrust_rmm_allocator.h" (#8027)
harrism Apr 22, 2021
50368de
Support more units in `cudf.DateOffset` (#7078)
brandon-b-miller Apr 22, 2021
b733082
Add `list_scalar` API (#7584)
isVoid Apr 22, 2021
fad1587
Add column validation in equality operations (#8040)
galipremsagar Apr 23, 2021
fc61648
Reduce peak device memory usage in ORC writer (#7719)
kaatish Apr 23, 2021
a6b495c
Refactor concatenation logic and fix various bugs (#7867)
vyasr Apr 23, 2021
791beb2
Adding struct support for semi_join (#8028)
hyperbolic2346 Apr 23, 2021
6e13988
Fixed join on mixed nullability columns (#7963)
hyperbolic2346 Apr 25, 2021
2bfd6fe
Implement string list concatenation (#7929)
ttnghia Apr 26, 2021
3ace5ec
Use device_read/device_write in Avro reader and ORC reader/writer (#8…
kaatish Apr 26, 2021
94afdda
Adds serialization of Decimal Columns and dtypes (#8041)
brandon-b-miller Apr 26, 2021
1894aeb
Convert cudf::rank to use device_uvector (#8029)
harrism Apr 26, 2021
098d7f8
Add `groupby::shift` API (#7910)
isVoid Apr 26, 2021
f0977a4
constexpr all is_*_impl::operator()<T>() (#8056)
karthikeyann Apr 26, 2021
9b2e456
Convert cudf::merge to use device_uvector instead of device_vector (#…
harrism Apr 27, 2021
1a0d304
Add validation for `errors` parameter in `cudf.to_datetime` (#8068)
galipremsagar Apr 27, 2021
d08e041
Remove 10.2 workarounds in groupby functions for dictionary column ty…
davidwendt Apr 27, 2021
9e72ae2
Numba version and deprecation updates (#8017)
gmarkall Apr 27, 2021
d91ccb9
Fix bug when constructing `list_scalar`, `stream` and `mr` was not pa…
isVoid Apr 27, 2021
78b8333
Fix `Series` inputs handling in `Dataframe` constructor (#8065)
galipremsagar Apr 27, 2021
85026b5
Fix semi join on mixed nullability columns (#8075)
sperlingxx Apr 27, 2021
a03cf7f
Replace cudf::strings::detail::modify_strings utility with make_strin…
davidwendt Apr 27, 2021
e9bc090
Remove `strong_typedef` in `fixed_point.hpp` (#8063)
codereport Apr 27, 2021
4d359c2
Update codeowners for benchmarks / tests CMake (#8066)
Apr 27, 2021
fbcf37f
expand the cuFile JNI wrapper to allow for more flexibility (#8053)
rongou Apr 27, 2021
31af285
Convert cudf::quantiles to use device_uvector (#8076)
harrism Apr 27, 2021
6c66bdc
Move make_strings_children to strings/detail/utilities.cuh (#8060)
davidwendt Apr 28, 2021
32222de
Remove remaining "when C++17" comments (#8089)
codereport Apr 28, 2021
663457b
Merge pull request #8101 from rapidsai/branch-0.19
GPUtester Apr 28, 2021
0ca0e69
Refactoring column logic Part 1 (#8081)
vyasr Apr 28, 2021
290c6ef
Allow CuPy 9 to be used with cuDF (#8082)
jakirkham Apr 28, 2021
b27aef6
Convert cudf::repeat to use device_uvector instead of device_vector (…
harrism Apr 28, 2021
1918de6
Fix bug: allow `lists::copy_slice` from an valid row that has an empt…
isVoid Apr 28, 2021
5a012e5
Extend range window queries to non-timestamp order-by columns (#7866)
mythrocks Apr 29, 2021
7f0ad1d
Add support for pydocstyle and test on abc.py (#7985)
vyasr Apr 29, 2021
ac25e97
Add python/cython bindings for `str.join` API (#8085)
galipremsagar Apr 29, 2021
ac4f943
Convert hashing, partitioning, and nested_loop_join APIs to use devi…
harrism Apr 29, 2021
1757d10
ensure cuFile JNI library is loaded before any use (#8105)
rongou Apr 29, 2021
04d6e5a
JNI support for scalar of list (#8077)
firestarman Apr 29, 2021
cea6c20
Use `cupy.ndarray` (without `core`) (#8114)
jakirkham Apr 29, 2021
e6f3f37
Switch from std::tie() to structured binding. (#8117)
mythrocks Apr 30, 2021
322eac6
Fix subword tokenizer to handle zero hash bin size (#8093)
davidwendt Apr 30, 2021
f686c01
ENH Remove conda defaults channel in dev environments (#8122)
jjacobelli Apr 30, 2021
aa61a6d
don't throw an exception when cuFile jni can't be loaded (#8124)
rongou Apr 30, 2021
b368ebd
Convert unordered_multiset to use device_uvector (#8091)
harrism Apr 30, 2021
7bf6de6
Fix `cudf_test/iterator_utilities.hpp` (#8126)
ttnghia Apr 30, 2021
4869c23
enable all aggregations for dictionary type (#8061)
karthikeyann May 1, 2021
cf8c73a
Some APIs to help with out of core joins in Spark (#8118)
revans2 May 1, 2021
8a4426f
Use spans in parquet writer (#7950)
devavret May 3, 2021
7623f39
Update Developer Guide to mention structured bindings (#8116)
mythrocks May 3, 2021
c7eccc1
Fix fragile logic in dask_cudf chunksize parquet test (#8108)
rjzamora May 3, 2021
27ae8c1
Redirect callable aggregations to their named equivalent in dask-cuDF…
charlesbluca May 3, 2021
6ab91f2
Implement interleave_columns for list type (#8046)
ttnghia May 3, 2021
ad081ae
Create a common code path for 1d Frames (#8115)
vyasr May 3, 2021
36eaa06
Subword Tokenizer HuggingFace like API (#7942)
VibhuJawa May 3, 2021
1debb96
Implement concatenate_rows for list type (#8049)
ttnghia May 3, 2021
5d50cde
Extend LEAD/LAG to work with non-fixed-width types (#8062)
mythrocks May 3, 2021
5b754ed
Update io supported types docs page (#8146)
galipremsagar May 4, 2021
bc9903a
Add binary ops benchmark (#8008)
karthikeyann May 4, 2021
7f3799c
convert replace/clamp.cu to use optional iterator (#8004)
robertmaynard May 4, 2021
be2a1ed
Fix groupby reduce_functor for fixed-point result-type (#8127)
davidwendt May 4, 2021
d56428a
Enable concat for decimal columns with mixed precision and scale (#8099)
ChrisJar May 4, 2021
770dc38
Fix `factorize` doc string (#8154)
galipremsagar May 4, 2021
53e1c66
Enable not equal for decimal columns (#8143)
ChrisJar May 4, 2021
a80dff2
Fix scatter output size for structs. (#8155)
mythrocks May 4, 2021
44f21b3
Add `decimal64Dtype` support in data-generator (#8107)
galipremsagar May 4, 2021
4715c83
Refactor AggregationJni to support collectSet (#8057)
sperlingxx May 5, 2021
9b727dd
ENH Remove 'rapidsai-nightly' conda channel when building main branch…
jjacobelli May 5, 2021
81046ff
Move scalar function definitions from scalar.hpp to scalar.cpp (#8112)
davidwendt May 5, 2021
f54ccd0
Fix lists strings scatter to handle zero child rows (#8103)
davidwendt May 5, 2021
2ead87c
Fix hash of fixed-point type to hash value component (#8141)
davidwendt May 5, 2021
c0f8176
Allow users to set jitify cache file limit via an environment variabl…
trxcllnt May 5, 2021
8cba3b0
Enable join results with size > INT32_MAX (#8139)
shwina May 5, 2021
559e8a3
Java bindings for Parquet struct support (#7998)
razajafri May 5, 2021
3106679
Convert grouped_rolling to use device_uvector (#8106)
harrism May 5, 2021
3af3bf3
Convert remaining uses of device_vector in groupby (#8148)
harrism May 5, 2021
4853dbc
Add a `copy()` method to `Buffer` (#8113)
shwina May 5, 2021
52fab32
Fix Java nightly build (#8169)
jlowe May 6, 2021
3940e56
Split up hashing.cu to improve compile time (#8168)
davidwendt May 6, 2021
202fff1
Merge Index and Series binops (#8166)
vyasr May 6, 2021
e8b9ff7
Refactor tests/groupby/** (#7604)
ttnghia May 6, 2021
611cabd
Add chars-tokenizer to nvtext tokenize_benchmark.cpp (#8125)
davidwendt May 6, 2021
96c0706
add Java unit tests for making list of list (#8111)
wbo4958 May 6, 2021
8ae73d5
Add Python bindings for ``get_json_object`` (#7981)
skirui-source May 6, 2021
2207577
Enable decimal fillna with integer scalars and series (#8172)
ChrisJar May 7, 2021
db21232
Fix struct scatter to correctly cascade null_mask to children columns…
ttnghia May 7, 2021
5f9dade
Support listConcatenateByRows in Java package (#8171)
sperlingxx May 7, 2021
e2c7067
Change aggregation class hierarchy to allow per-algorithm type enforc…
nvdbaranec May 7, 2021
b46913b
Use rmm::device_uvector in place of rmm::device_vector in cuIO (#8151)
vuule May 7, 2021
57a8ad2
JNI Rolling Aggregation Changes (#8069)
revans2 May 7, 2021
245d8c1
Fix orc reader assert on create data_type (#8174)
davidwendt May 7, 2021
e970a65
Split iterator tests to improve parallel compile times (#8167)
robertmaynard May 7, 2021
bb62cf1
Support `get_element` from LIST column (#8071)
isVoid May 7, 2021
3813d9b
Creates an empty column for the null `LIST Scalar` (#8173)
firestarman May 8, 2021
2273b6d
Enable division operator for decimal columns (#8149)
ChrisJar May 8, 2021
fbb9a98
Support nested input columns in copy_if_else() (#8135)
mythrocks May 10, 2021
97b2e9e
Fix interleave_columns on ListType with nullable child (#8181)
sperlingxx May 10, 2021
99df69f
Fix struct binary search to generate a validity column for both targe…
ttnghia May 10, 2021
c2c67de
Remove `boost` dependency (#7932)
codereport May 10, 2021
9328c56
Add notes in IO supported types doc table. (#8203)
galipremsagar May 11, 2021
9a063b6
Abstract Syntax Tree Cleanup and Tests (#7418)
codereport May 11, 2021
2c70f1d
Closed column view to avoid memory leak (#8202)
razajafri May 11, 2021
ae08422
patch thrust to fix intmax num elements limitation in scan_by_key (#8…
cwharris May 11, 2021
e94daa0
Fix concatenate_rows issue with lists of all empty strings (#8210)
sperlingxx May 11, 2021
3f064f9
Account for offset columns in lists::contains() (#8204)
mythrocks May 12, 2021
188b630
Bump up GDS user-space lib version to 0.95.1 (#8221)
pxLi May 12, 2021
667b9bc
Add Python bindings to list concatenation functions (#8087)
skirui-source May 12, 2021
bda8457
Update split-by-char to use input offsets column (#8195)
davidwendt May 12, 2021
cdf09ad
Convert tests to use device_uvector (#8205)
harrism May 12, 2021
4596fe6
Enable scatter on list of fixed point (#8211)
sperlingxx May 13, 2021
082596f
Convert benchmarks to use device_uvector (#8208)
harrism May 13, 2021
fb7cdcd
Reduce contiguous-split block_size for copy_partition (#8216)
davidwendt May 13, 2021
6b92e25
Enable scattering integers into decimal columns (#8225)
brandon-b-miller May 13, 2021
f8d7de4
Support exclude null_policy for collect list/set in groupby (#8044)
sperlingxx May 13, 2021
5987384
Exposing cudf list accessors to dask_cudf (#8197)
shaneding May 13, 2021
92f067d
Remove cuda 11.1 related files (#8224)
galipremsagar May 13, 2021
7864a66
Update release script (#8222)
raydouglass May 13, 2021
2a169c8
add java bindings for non-timestamps range window queries (#7909)
wbo4958 May 13, 2021
c681211
Fix doxygens and comments for various APIs (#8201)
ttnghia May 13, 2021
d206755
Fix cython flag to use c++17 (#8243)
galipremsagar May 14, 2021
304f460
Revert #7909 add java bindings for non-timestamps range window querie…
wbo4958 May 14, 2021
b7eeaf5
Split up scan.cu to improve compile time (#8183)
davidwendt May 14, 2021
2d1d4ec
add java bindings for non-timestamps range window queries (#8248)
wbo4958 May 14, 2021
3d12d63
Check if a map contains a specific key (#8209)
wjxiz1992 May 14, 2021
92a432d
Add scientific notation support to cudf::strings::to_fixed_point and …
davidwendt May 14, 2021
8d837f6
Fix reductions gtests failing for debug build (#8249)
davidwendt May 14, 2021
17ef131
Split out non-device code from fixed_point_tests.cu (#8238)
davidwendt May 17, 2021
0b9f178
Make java rolling window operation APIs consistent (#8251)
revans2 May 17, 2021
8406522
DOC Update to v21.06.00
raydouglass May 17, 2021
65e8372
Column refactoring 2 (#8130)
vyasr May 17, 2021
975d22f
Correct unused parameter warnings in dictonary algorithms (#8239)
robertmaynard May 18, 2021
eb88a38
Update cudfjni version to 21.06 (#8267)
pxLi May 18, 2021
59d8d5e
Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271)
trxcllnt May 18, 2021
bbce6bc
CMake always explicitly specify a source files extension (#8270)
robertmaynard May 18, 2021
72e017b
Add support for decimal types in ORC writer (#8198)
vuule May 18, 2021
56513a8
Update ORC statistics API to use C++17 standard library (#8241)
vuule May 18, 2021
414e9bb
Support for struct scalars. (#8220)
nvdbaranec May 18, 2021
8834ed6
Move more methods into SingleColumnFrame (#8253)
vyasr May 18, 2021
072cbee
Update io util to convert path like object to string (#8275)
ayushdg May 19, 2021
7af8966
Fix incorrect assertion in Java concat (#8258)
sperlingxx May 19, 2021
7e3249b
Pass compiler environment variables to conda python build (#8260)
jjacobelli May 19, 2021
6b09253
Correct unused parameters in the copying algorithms (#8232)
robertmaynard May 19, 2021
1f9f061
Update docs build script (#8284)
ajschmidt8 May 19, 2021
32c1bac
Add a flag for allowing single quotes in JSON strings. (#8144)
nvdbaranec May 19, 2021
5994751
Remove unused parameter from copy_partition kernel documentation (#8283)
robertmaynard May 19, 2021
b0dc972
add unit tests for lead/lag on list for row window (#8259)
wbo4958 May 19, 2021
2b9fc62
Fixes CSV-reader type inference for thousands separator and decimal p…
elstehle May 19, 2021
c732cef
Support create lists column from a `list_scalar` (#8185)
isVoid May 20, 2021
2da8473
Create a String column from UTF8 String byte arrays (#8257)
firestarman May 20, 2021
48647aa
Java: Support creating a scalar from utf8 string (#8294)
firestarman May 20, 2021
0ebf7e6
support RMM aligned resource adapter in JNI (#8266)
rongou May 20, 2021
deee1f6
update changelog (#8297)
ajschmidt8 May 20, 2021
7427049
Remove abc inheritance from Serializable (#8254)
vyasr May 20, 2021
944e932
Implement `lists::concatenate_list_elements` (#8231)
ttnghia May 20, 2021
75e12d1
Actually test equality in assert_groupby_results_equal (#8272)
shwina May 20, 2021
47c3572
Merge remote-tracking branch 'upstream/branch-0.19' into branch-21.06…
ajschmidt8 May 20, 2021
9e308de
Merge pull request #8302 from ajschmidt8/branch-21.06-merge-0.19
ajschmidt8 May 20, 2021
3975f10
Update `CHANGELOG.md` links for calver (#8303)
ajschmidt8 May 20, 2021
2a1075e
use address and length for GDS reads/writes (#8301)
rongou May 20, 2021
b553144
Return python lists for __getitem__ calls to list type series (#8265)
brandon-b-miller May 20, 2021
c7d0524
Copy nested types upon construction (#8244)
isVoid May 20, 2021
9a85b3b
Update cudfjni version to 21.06.0 (#8292)
pxLi May 21, 2021
b84c792
Fix concatenate_lists_ignore_null on rows of all_nulls (#8312)
sperlingxx May 21, 2021
6920f9b
Update readme with correct CUDA versions (#8315)
raydouglass May 21, 2021
5c6b92a
COLLECT_LIST support returning empty output columns. (#8279)
mythrocks May 21, 2021
de579a5
Added decimal writing for CSV writer (#8296)
kaatish May 21, 2021
696902d
Enable implicit casting when concatenating mixed types (#8276)
ChrisJar May 23, 2021
ef20706
Add separator-on-null parameter to strings concatenate APIs (#8282)
davidwendt May 24, 2021
b9588d1
JNI: Refactor the code of making column from scalar (#8310)
firestarman May 24, 2021
936b02d
Add description of the cuIO GDS integration (#8293)
vuule May 24, 2021
259d69b
Revert "patch thrust to fix intmax num elements limitation in scan_by…
cwharris May 24, 2021
3da0d12
added _is_homogeneous property (#8299)
shaneding May 24, 2021
63faf2f
Use empty_like in scatter (#8314)
revans2 May 24, 2021
e555643
Update environment variable used to determine `cuda_version` (#8321)
ajschmidt8 May 24, 2021
b1d7788
Update Java string concatenate test for single column (#8330)
tgravescs May 24, 2021
5c0a75b
Fix cudf release version in readme (#8331)
galipremsagar May 24, 2021
691dd11
Refactor of rolling_window implementation. (#8158)
nvdbaranec May 24, 2021
7e725b5
Do not add nulls to the hash table when null_equality::NOT_EQUAL is p…
nvdbaranec May 24, 2021
7eaf3d7
Preserve column hierarchy when getting NULL row from `LIST` column (#…
isVoid May 24, 2021
c398054
Support scattering `list_scalar` (#8256)
isVoid May 24, 2021
6dbf2d5
Add `groupby::replace_nulls(replace_policy)` api (#7118)
isVoid May 24, 2021
dd5eecd
Fix struct binary search and struct flattening (#8268)
ttnghia May 24, 2021
6db757b
Fix structs column description in dev docs (#8318)
isVoid May 25, 2021
eea8cab
upgrade dlpack to 0.5 (#8262)
cwharris May 25, 2021
2383193
Java: Support struct scalar (#8327)
sperlingxx May 26, 2021
cbbcba7
Add support for `make_meta_obj` dispatch in `dask-cudf` (#8342)
galipremsagar May 26, 2021
fa6e7e0
Make device_buffer streams explicit and enforce move construction (#8…
harrism May 26, 2021
e97fc1c
Add backward compatibility for `dask-cudf` to work with other version…
galipremsagar May 26, 2021
cfc7ef9
Introduce a common parent class for NumericalColumn and DecimalColumn…
vyasr May 26, 2021
96f8df7
Add aliases for string methods (#8353)
shwina May 26, 2021
2a19a31
Raise error when unsupported arguments are passed to `dask_cudf.DataF…
galipremsagar May 26, 2021
e598361
Raise `NotImplementedError` for axis=1 in `rank` (#8347)
galipremsagar May 26, 2021
ddba88d
Add docstring for `dask_cudf.read_csv` (#8355)
galipremsagar May 26, 2021
24e05a0
IO statistics cleanup (#8191)
kaatish May 26, 2021
cd7fe6f
`Groupby.shift` c++ API refactor and python binding (#8131)
isVoid May 26, 2021
773fc7a
`strings::join_list_elements` options for empty list inputs (#8285)
ttnghia May 26, 2021
be05a00
Support collect_set on rolling window (#7881)
sperlingxx May 26, 2021
a29c0e3
Add Java API for Concatenate strings with separator (#8289)
tgravescs May 26, 2021
50c0335
Compilation fix: Remove redefinition for `std::is_same_v()` (#8369)
mythrocks May 27, 2021
4d1a62e
Fix struct flattening to add a validity column only when the input co…
ttnghia May 27, 2021
3ee8893
Handle empty results with nested types in copy_if_else (#8359)
nvdbaranec May 27, 2021
bcb1237
Handle nested column types properly for empty parquet files. (#8350)
nvdbaranec May 27, 2021
f7ede6a
Add support merging b/w categorical data (#8332)
galipremsagar May 27, 2021
b9bc78e
support space in workspace (#7956)
jolorunyomi May 27, 2021
7231e3b
Clip decimal binary op precision at max precision (#8194)
ChrisJar May 27, 2021
0eeb0c9
Fix result column types for empty inputs to rolling window (#8274)
mythrocks May 28, 2021
e9fe3c5
Support Dask + Distributed 2021.05.1 (#8392)
jakirkham Jun 1, 2021
2e780d0
pin dask for ci
galipremsagar Jun 1, 2021
b616807
Merge pull request #8421 from galipremsagar/pin_dask
msadang Jun 1, 2021
854176b
Update UCX-Py version to 0.20 (#8446)
pentschev Jun 7, 2021
0c869bc
FIX update-version.sh for CalVer (#8469)
raydouglass Jun 9, 2021
85f04d0
update changelog
raydouglass Jun 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Check if a map contains a specific key (#8209)
To close #8120

As required in Spark 3.1.1, when ANSI mode is enabled, GetMapValue should throw an exception when the key is not found in the map in a row. 
So plugin side needs to check if a map column contains the specific key in all rows.

The new added method `mapContains` in this PR should return a column of boolean, where _false_ means key is not found.

Authors:
  - Allen Xu (https://github.com/wjxiz1992)

Approvers:
  - Jason Lowe (https://github.com/jlowe)

URL: #8209
wjxiz1992 authored May 14, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 3d12d63ac2c10c89d29317fd014012e078795c2c
22 changes: 22 additions & 0 deletions java/src/main/java/ai/rapids/cudf/ColumnView.java
Original file line number Diff line number Diff line change
@@ -2535,6 +2535,19 @@ public final ColumnVector getMapValue(Scalar key) {
return new ColumnVector(mapLookup(getNativeView(), key.getScalarHandle()));
}

/** For a column of type List<Struct<String, String>> and a passed in String key, return a boolean
* column for all keys in the structs, It is true if the key exists in the corresponding map for
* that row, false otherwise. It will never return null for a row.
* @param key the String scalar to lookup in the column
* @return a boolean column based on the lookup result
*/
public final ColumnVector getMapKeyExistence(Scalar key) {
assert type.equals(DType.LIST) : "column type must be a LIST";
assert key != null : "target string may not be null";
assert key.getType().equals(DType.STRING) : "target must be a string scalar";

return new ColumnVector(mapContains(getNativeView(), key.getScalarHandle()));
}

/**
* Create a new struct column view of existing column views. Note that this will NOT copy
@@ -2853,6 +2866,15 @@ private static native long stringReplaceWithBackrefs(long columnView, String pat
* @throws CudfException
*/
private static native long mapLookup(long columnView, long key) throws CudfException;

/**
* Native method for check the existence of a key over a column of List<Struct<String,String>>
* @param columnView the column view handle of the map
* @param key the string scalar that is the key for lookup
* @return boolean column handle of the result
* @throws CudfException
*/
private static native long mapContains(long columnView, long key) throws CudfException;
/**
* Native method to add zeros as padding to the left of each string.
*/
16 changes: 16 additions & 0 deletions java/src/main/native/src/ColumnViewJni.cpp
Original file line number Diff line number Diff line change
@@ -1169,6 +1169,22 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnView_mapLookup(JNIEnv *env, jc
CATCH_STD(env, 0);
}

JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnView_mapContains(JNIEnv *env, jclass,
jlong map_column_view,
jlong lookup_key) {
JNI_NULL_CHECK(env, map_column_view, "column is null", 0);
JNI_NULL_CHECK(env, lookup_key, "target string scalar is null", 0);
try {
cudf::jni::auto_set_device(env);
cudf::column_view *cv = reinterpret_cast<cudf::column_view *>(map_column_view);
cudf::string_scalar *ss_key = reinterpret_cast<cudf::string_scalar *>(lookup_key);

std::unique_ptr<cudf::column> result = cudf::jni::map_contains(*cv, *ss_key);
return reinterpret_cast<jlong>(result.release());
}
CATCH_STD(env, 0);
}

JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnView_stringReplaceWithBackrefs(JNIEnv *env,
jclass,
jlong column_view,
58 changes: 49 additions & 9 deletions java/src/main/native/src/map_lookup.cu
Original file line number Diff line number Diff line change
@@ -13,14 +13,18 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <cudf/column/column.hpp>
#include <cudf/column/column_device_view.cuh>
#include <cudf/column/column_factories.hpp>
#include <cudf/detail/gather.hpp>
#include <cudf/detail/utilities/cuda.cuh>
#include <cudf/lists/contains.hpp>
#include <cudf/lists/lists_column_view.hpp>
#include <cudf/replace.hpp>
#include <cudf/scalar/scalar.hpp>
#include <cudf/scalar/scalar_device_view.cuh>
#include <cudf/scalar/scalar_factories.hpp>
#include <cudf/structs/structs_column_view.hpp>
#include <cudf/table/table_view.hpp>
#include <cudf/types.hpp>
@@ -124,27 +128,63 @@ get_gather_map_for_map_values(column_view const &input, string_scalar &lookup_ke
return gather_map;
}

} // namespace

namespace jni {
std::unique_ptr<column> map_lookup(column_view const &map_column, string_scalar lookup_key,
bool has_nulls, rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource *mr) {
// Defensive checks.
/**
* @brief a defensive check for the map column that is going to be processed
*/
void map_input_check(column_view const &map_column, rmm::cuda_stream_view stream) {
CUDF_EXPECTS(map_column.type().id() == type_id::LIST, "Expected LIST<STRUCT<key,value>>.");

lists_column_view lcv{map_column};
auto structs_column = lcv.get_sliced_child(stream);
column_view structs_column = lcv.get_sliced_child(stream);

CUDF_EXPECTS(structs_column.type().id() == type_id::STRUCT, "Expected LIST<STRUCT<key,value>>.");

structs_column_view scv{structs_column};
CUDF_EXPECTS(structs_column.num_children() == 2, "Expected LIST<STRUCT<key,value>>.");
CUDF_EXPECTS(structs_column.child(0).type().id() == type_id::STRING,
"Expected LIST<STRUCT<key,value>>.");
CUDF_EXPECTS(structs_column.child(1).type().id() == type_id::STRING,
"Expected LIST<STRUCT<key,value>>.");
}

} // namespace

namespace jni {

std::unique_ptr<column> map_contains(column_view const &map_column, string_scalar lookup_key,
bool has_nulls, rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource *mr) {
// Defensive checks.
map_input_check(map_column, stream);

lists_column_view lcv(map_column);
structs_column_view scv(lcv.child());

std::vector<column_view> children;
children.push_back(lcv.offsets());
children.push_back(scv.child(0));

column_view list_of_keys(map_column.type(), map_column.size(),
nullptr, map_column.null_mask(), map_column.null_count(), 0, children);
auto contains_column = lists::contains(list_of_keys, lookup_key);
// null will be skipped in all-aggregation when checking if all rows contain the key,
// so replace all nulls with 0.
std::unique_ptr<cudf::scalar> replacement =
cudf::make_numeric_scalar(cudf::data_type(cudf::type_id::BOOL8));
replacement->set_valid(true);
using ScalarType = cudf::scalar_type_t<int8_t>;
static_cast<ScalarType *>(replacement.get())->set_value(0);
auto result = cudf::replace_nulls(contains_column->view(), *replacement);
return result;
}

std::unique_ptr<column> map_lookup(column_view const &map_column, string_scalar lookup_key,
bool has_nulls, rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource *mr) {
// Defensive checks.
map_input_check(map_column, stream);

lists_column_view lcv{map_column};
column_view structs_column = lcv.get_sliced_child(stream);
// Two-pass plan: construct gather map, and then gather() on structs_column.child(1). Plan A.
// (Can do in one pass perhaps, but that's Plan B.)

32 changes: 32 additions & 0 deletions java/src/main/native/src/map_lookup.hpp
Original file line number Diff line number Diff line change
@@ -51,6 +51,38 @@ map_lookup(column_view const &map_column, string_scalar lookup_key, bool has_nul
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource());


/**
* @brief Looks up a "map" column by specified key to see if the key exists or not,
* and returns a cudf column of bool value.
*
* The map-column is represented as follows:
*
* list_view<struct_view< string_view, string_view > >.
* <---KEY---> <--VALUE-->
*
* The string_view struct members are the key and value, respectively.
* For each row in the input list column, if the key is not found, false will be returned for that
* row.
* Note: when search for the scalar key of "null", a column full of "false" will be returned because
* map_contains is leveraging cudf::list:contains.
*
* @param map_column The input "map" column to be searched. Must be of
* type list_view<struct_view<string_view, string_view>>.
* @param lookup_key The search key, whose index(offset) is to be returned for each list row
* @param has_nulls Whether the input column might contain null list-rows, or null keys.
* @param stream The CUDA stream
* @param mr The device memory resource to be used for allocations
* @return An boolean column reflecting the existence of the key in each row in the map
* column. True means the lookup_key is found in that row.
* @throw cudf::logic_error If the input column is not of type
* list_view<struct_view<string_view, string_view>>
*/
std::unique_ptr<column>
map_contains(column_view const &map_column, string_scalar lookup_key, bool has_nulls = true,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource());

} // namespace jni

} // namespace cudf
29 changes: 29 additions & 0 deletions java/src/test/java/ai/rapids/cudf/ColumnVectorTest.java
Original file line number Diff line number Diff line change
@@ -4422,6 +4422,35 @@ void testGetMapValue() {
}
}

@Test
void testGetMapKeyExistence() {
List<HostColumnVector.StructData> list1 = Arrays.asList(new HostColumnVector.StructData("a", "b"));
List<HostColumnVector.StructData> list2 = Arrays.asList(new HostColumnVector.StructData("a", "c"));
List<HostColumnVector.StructData> list3 = Arrays.asList(new HostColumnVector.StructData("e", "d"));
List<HostColumnVector.StructData> list4 = Arrays.asList(new HostColumnVector.StructData("a", "g"));
List<HostColumnVector.StructData> list5 = Arrays.asList(new HostColumnVector.StructData("a", null));
List<HostColumnVector.StructData> list6 = Arrays.asList(new HostColumnVector.StructData(null, null));
List<HostColumnVector.StructData> list7 = Arrays.asList(new HostColumnVector.StructData());
HostColumnVector.StructType structType = new HostColumnVector.StructType(true, Arrays.asList(new HostColumnVector.BasicType(true, DType.STRING),
new HostColumnVector.BasicType(true, DType.STRING)));
try (ColumnVector cv = ColumnVector.fromLists(new HostColumnVector.ListType(true, structType), list1, list2, list3, list4, list5, list6, list7);
ColumnVector resValidKey = cv.getMapKeyExistence(Scalar.fromString("a"));
ColumnVector expectedValid = ColumnVector.fromBoxedBooleans(true, true, false, true, true, false, false);
ColumnVector expectedNull = ColumnVector.fromBoxedBooleans(false, false, false, false, false, false, false);
ColumnVector resNullKey = cv.getMapKeyExistence(Scalar.fromNull(DType.STRING))) {
assertColumnsAreEqual(expectedValid, resValidKey);
assertColumnsAreEqual(expectedNull, resNullKey);
}

AssertionError e = assertThrows(AssertionError.class, () -> {
try (ColumnVector cv = ColumnVector.fromLists(new HostColumnVector.ListType(true, structType), list1, list2, list3, list4, list5, list6, list7);
ColumnVector resNullKey = cv.getMapKeyExistence(null)) {
}
});
assertTrue(e.getMessage().contains("target string may not be null"));
}


@Test
void testListOfStructsOfStructs() {
List<HostColumnVector.StructData> list1 = Arrays.asList(