Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v21.06 #8418

Merged
merged 327 commits into from
Jun 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
327 commits
Select commit Hold shift + click to select a range
025c56a
Merge pull request #7979 from kkraus14/fix_automerge
dillon-cullinan Apr 16, 2021
dda9cb9
Replace device_vector with device_uvector in gather (#7758)
harrism Apr 16, 2021
98711da
Centralize logic for getattr->getitem override (#7845)
vyasr Apr 16, 2021
bc422fc
Enable scattering scalars into decimal columns (#7899)
brandon-b-miller Apr 16, 2021
9d2907d
auto set device in cufile jni functions (#7983)
rongou Apr 16, 2021
c05aca3
Remove `device_vector`s from parquet (#7853)
devavret Apr 17, 2021
ed68e96
Update to CPM with fix for `FETCHCONTENT_BASE_DIR` (#7982)
trxcllnt Apr 17, 2021
4da38a6
Fix union operation in `_is_supported()` (#7959)
charlesbluca Apr 17, 2021
1d03186
Remove nvstrdesc_s from cuio (#7841)
kaatish Apr 17, 2021
867d6ee
Revert "Java API change for supporting structs (#7730)" (#7987)
razajafri Apr 17, 2021
808262f
Remove char_width member from cudf::string_view class (#7914)
davidwendt Apr 19, 2021
05330dd
Update CUDA version in build scripts (#7984)
ajschmidt8 Apr 19, 2021
b8baed5
Fix ORC reader issue with bystream reader (#7988)
rgsl888prabhu Apr 19, 2021
4893259
Unpin pandas to patch version (#7992)
galipremsagar Apr 19, 2021
1775c3d
Struct binary search (lower_bound/upper_bound) (#7865)
ttnghia Apr 19, 2021
3d212b3
Defer to `iloc` when indexer is an integer-like object and the `Index…
skirui-source Apr 19, 2021
68495f9
Refactor libcudf strings concatenate to use make_strings_children (#7…
davidwendt Apr 20, 2021
2214407
Merge pull request #7999 from rapidsai/branch-0.19
GPUtester Apr 20, 2021
6fb7909
Add JNI for splitting groups in a table after groupby (#7954)
firestarman Apr 20, 2021
8e42e73
Fix incorrect explode_outer_position column null values in some cases…
hyperbolic2346 Apr 20, 2021
5c2f744
Introduces `make_optional_iterator` for nullable column and scalars (…
robertmaynard Apr 20, 2021
d501d2c
Adding support for Decimal/Fixed-point to ORC reader (#7970)
rgsl888prabhu Apr 20, 2021
f11bcd7
Remove obsolete cudf::strings::replace_nulls (#7965)
davidwendt Apr 21, 2021
c0cf5e1
Add groupby product support (#7763)
karthikeyann Apr 21, 2021
5be3a62
Update `ci/local/build.sh` default `$DOCKER_IMAGE` (#8013)
codereport Apr 21, 2021
5b71fca
Improve CSV writer tests (#7851)
vuule Apr 21, 2021
db30202
Enable quantile for decimal columns (#7927)
ChrisJar Apr 21, 2021
d4d64c0
Support setting to a new column in `DataFrame.loc` (#8012)
isVoid Apr 21, 2021
dcebfe7
Fix dask_cudf metadata-inference when first ORC path is empty (#8021)
rjzamora Apr 22, 2021
f8f9988
Use rmm::device_uvector in place of rmm::device_vector for CSV reader…
vuule Apr 22, 2021
069bf96
Fix returned column type when extracting from an empty list column (#…
jlowe Apr 22, 2021
e018722
add null order support to detail::drop_duplicates (#7938)
cwharris Apr 22, 2021
e385180
remove decimals_as_float64 from orc benchmarks (#8007)
cwharris Apr 22, 2021
151f4c5
Bump cmake minimum version to 3.18 in dev environments (#8005)
galipremsagar Apr 22, 2021
226f4bb
Merge pull request #8035 from rapidsai/branch-0.19
GPUtester Apr 22, 2021
3e3463b
Merge pull request #8038 from rapidsai/branch-0.19
GPUtester Apr 22, 2021
8dae31c
Add groupby scan aggregation to cudf (#7759)
karthikeyann Apr 22, 2021
484ce75
Use segmented_sort_by_key for sorting smaller lists (#7973)
shwina Apr 22, 2021
f18af7f
Remove references to deprecated "rmm/thrust_rmm_allocator.h" (#8027)
harrism Apr 22, 2021
50368de
Support more units in `cudf.DateOffset` (#7078)
brandon-b-miller Apr 22, 2021
b733082
Add `list_scalar` API (#7584)
isVoid Apr 22, 2021
fad1587
Add column validation in equality operations (#8040)
galipremsagar Apr 23, 2021
fc61648
Reduce peak device memory usage in ORC writer (#7719)
kaatish Apr 23, 2021
a6b495c
Refactor concatenation logic and fix various bugs (#7867)
vyasr Apr 23, 2021
791beb2
Adding struct support for semi_join (#8028)
hyperbolic2346 Apr 23, 2021
6e13988
Fixed join on mixed nullability columns (#7963)
hyperbolic2346 Apr 25, 2021
2bfd6fe
Implement string list concatenation (#7929)
ttnghia Apr 26, 2021
3ace5ec
Use device_read/device_write in Avro reader and ORC reader/writer (#8…
kaatish Apr 26, 2021
94afdda
Adds serialization of Decimal Columns and dtypes (#8041)
brandon-b-miller Apr 26, 2021
1894aeb
Convert cudf::rank to use device_uvector (#8029)
harrism Apr 26, 2021
098d7f8
Add `groupby::shift` API (#7910)
isVoid Apr 26, 2021
f0977a4
constexpr all is_*_impl::operator()<T>() (#8056)
karthikeyann Apr 26, 2021
9b2e456
Convert cudf::merge to use device_uvector instead of device_vector (#…
harrism Apr 27, 2021
1a0d304
Add validation for `errors` parameter in `cudf.to_datetime` (#8068)
galipremsagar Apr 27, 2021
d08e041
Remove 10.2 workarounds in groupby functions for dictionary column ty…
davidwendt Apr 27, 2021
9e72ae2
Numba version and deprecation updates (#8017)
gmarkall Apr 27, 2021
d91ccb9
Fix bug when constructing `list_scalar`, `stream` and `mr` was not pa…
isVoid Apr 27, 2021
78b8333
Fix `Series` inputs handling in `Dataframe` constructor (#8065)
galipremsagar Apr 27, 2021
85026b5
Fix semi join on mixed nullability columns (#8075)
sperlingxx Apr 27, 2021
a03cf7f
Replace cudf::strings::detail::modify_strings utility with make_strin…
davidwendt Apr 27, 2021
e9bc090
Remove `strong_typedef` in `fixed_point.hpp` (#8063)
codereport Apr 27, 2021
4d359c2
Update codeowners for benchmarks / tests CMake (#8066)
Apr 27, 2021
fbcf37f
expand the cuFile JNI wrapper to allow for more flexibility (#8053)
rongou Apr 27, 2021
31af285
Convert cudf::quantiles to use device_uvector (#8076)
harrism Apr 27, 2021
6c66bdc
Move make_strings_children to strings/detail/utilities.cuh (#8060)
davidwendt Apr 28, 2021
32222de
Remove remaining "when C++17" comments (#8089)
codereport Apr 28, 2021
663457b
Merge pull request #8101 from rapidsai/branch-0.19
GPUtester Apr 28, 2021
0ca0e69
Refactoring column logic Part 1 (#8081)
vyasr Apr 28, 2021
290c6ef
Allow CuPy 9 to be used with cuDF (#8082)
jakirkham Apr 28, 2021
b27aef6
Convert cudf::repeat to use device_uvector instead of device_vector (…
harrism Apr 28, 2021
1918de6
Fix bug: allow `lists::copy_slice` from an valid row that has an empt…
isVoid Apr 28, 2021
5a012e5
Extend range window queries to non-timestamp order-by columns (#7866)
mythrocks Apr 29, 2021
7f0ad1d
Add support for pydocstyle and test on abc.py (#7985)
vyasr Apr 29, 2021
ac25e97
Add python/cython bindings for `str.join` API (#8085)
galipremsagar Apr 29, 2021
ac4f943
Convert hashing, partitioning, and nested_loop_join APIs to use devi…
harrism Apr 29, 2021
1757d10
ensure cuFile JNI library is loaded before any use (#8105)
rongou Apr 29, 2021
04d6e5a
JNI support for scalar of list (#8077)
firestarman Apr 29, 2021
cea6c20
Use `cupy.ndarray` (without `core`) (#8114)
jakirkham Apr 29, 2021
e6f3f37
Switch from std::tie() to structured binding. (#8117)
mythrocks Apr 30, 2021
322eac6
Fix subword tokenizer to handle zero hash bin size (#8093)
davidwendt Apr 30, 2021
f686c01
ENH Remove conda defaults channel in dev environments (#8122)
jjacobelli Apr 30, 2021
aa61a6d
don't throw an exception when cuFile jni can't be loaded (#8124)
rongou Apr 30, 2021
b368ebd
Convert unordered_multiset to use device_uvector (#8091)
harrism Apr 30, 2021
7bf6de6
Fix `cudf_test/iterator_utilities.hpp` (#8126)
ttnghia Apr 30, 2021
4869c23
enable all aggregations for dictionary type (#8061)
karthikeyann May 1, 2021
cf8c73a
Some APIs to help with out of core joins in Spark (#8118)
revans2 May 1, 2021
8a4426f
Use spans in parquet writer (#7950)
devavret May 3, 2021
7623f39
Update Developer Guide to mention structured bindings (#8116)
mythrocks May 3, 2021
c7eccc1
Fix fragile logic in dask_cudf chunksize parquet test (#8108)
rjzamora May 3, 2021
27ae8c1
Redirect callable aggregations to their named equivalent in dask-cuDF…
charlesbluca May 3, 2021
6ab91f2
Implement interleave_columns for list type (#8046)
ttnghia May 3, 2021
ad081ae
Create a common code path for 1d Frames (#8115)
vyasr May 3, 2021
36eaa06
Subword Tokenizer HuggingFace like API (#7942)
VibhuJawa May 3, 2021
1debb96
Implement concatenate_rows for list type (#8049)
ttnghia May 3, 2021
5d50cde
Extend LEAD/LAG to work with non-fixed-width types (#8062)
mythrocks May 3, 2021
5b754ed
Update io supported types docs page (#8146)
galipremsagar May 4, 2021
bc9903a
Add binary ops benchmark (#8008)
karthikeyann May 4, 2021
7f3799c
convert replace/clamp.cu to use optional iterator (#8004)
robertmaynard May 4, 2021
be2a1ed
Fix groupby reduce_functor for fixed-point result-type (#8127)
davidwendt May 4, 2021
d56428a
Enable concat for decimal columns with mixed precision and scale (#8099)
ChrisJar May 4, 2021
770dc38
Fix `factorize` doc string (#8154)
galipremsagar May 4, 2021
53e1c66
Enable not equal for decimal columns (#8143)
ChrisJar May 4, 2021
a80dff2
Fix scatter output size for structs. (#8155)
mythrocks May 4, 2021
44f21b3
Add `decimal64Dtype` support in data-generator (#8107)
galipremsagar May 4, 2021
4715c83
Refactor AggregationJni to support collectSet (#8057)
sperlingxx May 5, 2021
9b727dd
ENH Remove 'rapidsai-nightly' conda channel when building main branch…
jjacobelli May 5, 2021
81046ff
Move scalar function definitions from scalar.hpp to scalar.cpp (#8112)
davidwendt May 5, 2021
f54ccd0
Fix lists strings scatter to handle zero child rows (#8103)
davidwendt May 5, 2021
2ead87c
Fix hash of fixed-point type to hash value component (#8141)
davidwendt May 5, 2021
c0f8176
Allow users to set jitify cache file limit via an environment variabl…
trxcllnt May 5, 2021
8cba3b0
Enable join results with size > INT32_MAX (#8139)
shwina May 5, 2021
559e8a3
Java bindings for Parquet struct support (#7998)
razajafri May 5, 2021
3106679
Convert grouped_rolling to use device_uvector (#8106)
harrism May 5, 2021
3af3bf3
Convert remaining uses of device_vector in groupby (#8148)
harrism May 5, 2021
4853dbc
Add a `copy()` method to `Buffer` (#8113)
shwina May 5, 2021
52fab32
Fix Java nightly build (#8169)
jlowe May 6, 2021
3940e56
Split up hashing.cu to improve compile time (#8168)
davidwendt May 6, 2021
202fff1
Merge Index and Series binops (#8166)
vyasr May 6, 2021
e8b9ff7
Refactor tests/groupby/** (#7604)
ttnghia May 6, 2021
611cabd
Add chars-tokenizer to nvtext tokenize_benchmark.cpp (#8125)
davidwendt May 6, 2021
96c0706
add Java unit tests for making list of list (#8111)
wbo4958 May 6, 2021
8ae73d5
Add Python bindings for ``get_json_object`` (#7981)
skirui-source May 6, 2021
2207577
Enable decimal fillna with integer scalars and series (#8172)
ChrisJar May 7, 2021
db21232
Fix struct scatter to correctly cascade null_mask to children columns…
ttnghia May 7, 2021
5f9dade
Support listConcatenateByRows in Java package (#8171)
sperlingxx May 7, 2021
e2c7067
Change aggregation class hierarchy to allow per-algorithm type enforc…
nvdbaranec May 7, 2021
b46913b
Use rmm::device_uvector in place of rmm::device_vector in cuIO (#8151)
vuule May 7, 2021
57a8ad2
JNI Rolling Aggregation Changes (#8069)
revans2 May 7, 2021
245d8c1
Fix orc reader assert on create data_type (#8174)
davidwendt May 7, 2021
e970a65
Split iterator tests to improve parallel compile times (#8167)
robertmaynard May 7, 2021
bb62cf1
Support `get_element` from LIST column (#8071)
isVoid May 7, 2021
3813d9b
Creates an empty column for the null `LIST Scalar` (#8173)
firestarman May 8, 2021
2273b6d
Enable division operator for decimal columns (#8149)
ChrisJar May 8, 2021
fbb9a98
Support nested input columns in copy_if_else() (#8135)
mythrocks May 10, 2021
97b2e9e
Fix interleave_columns on ListType with nullable child (#8181)
sperlingxx May 10, 2021
99df69f
Fix struct binary search to generate a validity column for both targe…
ttnghia May 10, 2021
c2c67de
Remove `boost` dependency (#7932)
codereport May 10, 2021
9328c56
Add notes in IO supported types doc table. (#8203)
galipremsagar May 11, 2021
9a063b6
Abstract Syntax Tree Cleanup and Tests (#7418)
codereport May 11, 2021
2c70f1d
Closed column view to avoid memory leak (#8202)
razajafri May 11, 2021
ae08422
patch thrust to fix intmax num elements limitation in scan_by_key (#8…
cwharris May 11, 2021
e94daa0
Fix concatenate_rows issue with lists of all empty strings (#8210)
sperlingxx May 11, 2021
3f064f9
Account for offset columns in lists::contains() (#8204)
mythrocks May 12, 2021
188b630
Bump up GDS user-space lib version to 0.95.1 (#8221)
pxLi May 12, 2021
667b9bc
Add Python bindings to list concatenation functions (#8087)
skirui-source May 12, 2021
bda8457
Update split-by-char to use input offsets column (#8195)
davidwendt May 12, 2021
cdf09ad
Convert tests to use device_uvector (#8205)
harrism May 12, 2021
4596fe6
Enable scatter on list of fixed point (#8211)
sperlingxx May 13, 2021
082596f
Convert benchmarks to use device_uvector (#8208)
harrism May 13, 2021
fb7cdcd
Reduce contiguous-split block_size for copy_partition (#8216)
davidwendt May 13, 2021
6b92e25
Enable scattering integers into decimal columns (#8225)
brandon-b-miller May 13, 2021
f8d7de4
Support exclude null_policy for collect list/set in groupby (#8044)
sperlingxx May 13, 2021
5987384
Exposing cudf list accessors to dask_cudf (#8197)
shaneding May 13, 2021
92f067d
Remove cuda 11.1 related files (#8224)
galipremsagar May 13, 2021
7864a66
Update release script (#8222)
raydouglass May 13, 2021
2a169c8
add java bindings for non-timestamps range window queries (#7909)
wbo4958 May 13, 2021
c681211
Fix doxygens and comments for various APIs (#8201)
ttnghia May 13, 2021
d206755
Fix cython flag to use c++17 (#8243)
galipremsagar May 14, 2021
304f460
Revert #7909 add java bindings for non-timestamps range window querie…
wbo4958 May 14, 2021
b7eeaf5
Split up scan.cu to improve compile time (#8183)
davidwendt May 14, 2021
2d1d4ec
add java bindings for non-timestamps range window queries (#8248)
wbo4958 May 14, 2021
3d12d63
Check if a map contains a specific key (#8209)
wjxiz1992 May 14, 2021
92a432d
Add scientific notation support to cudf::strings::to_fixed_point and …
davidwendt May 14, 2021
8d837f6
Fix reductions gtests failing for debug build (#8249)
davidwendt May 14, 2021
17ef131
Split out non-device code from fixed_point_tests.cu (#8238)
davidwendt May 17, 2021
0b9f178
Make java rolling window operation APIs consistent (#8251)
revans2 May 17, 2021
8406522
DOC Update to v21.06.00
raydouglass May 17, 2021
65e8372
Column refactoring 2 (#8130)
vyasr May 17, 2021
975d22f
Correct unused parameter warnings in dictonary algorithms (#8239)
robertmaynard May 18, 2021
eb88a38
Update cudfjni version to 21.06 (#8267)
pxLi May 18, 2021
59d8d5e
Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271)
trxcllnt May 18, 2021
bbce6bc
CMake always explicitly specify a source files extension (#8270)
robertmaynard May 18, 2021
72e017b
Add support for decimal types in ORC writer (#8198)
vuule May 18, 2021
56513a8
Update ORC statistics API to use C++17 standard library (#8241)
vuule May 18, 2021
414e9bb
Support for struct scalars. (#8220)
nvdbaranec May 18, 2021
8834ed6
Move more methods into SingleColumnFrame (#8253)
vyasr May 18, 2021
072cbee
Update io util to convert path like object to string (#8275)
ayushdg May 19, 2021
7af8966
Fix incorrect assertion in Java concat (#8258)
sperlingxx May 19, 2021
7e3249b
Pass compiler environment variables to conda python build (#8260)
jjacobelli May 19, 2021
6b09253
Correct unused parameters in the copying algorithms (#8232)
robertmaynard May 19, 2021
1f9f061
Update docs build script (#8284)
ajschmidt8 May 19, 2021
32c1bac
Add a flag for allowing single quotes in JSON strings. (#8144)
nvdbaranec May 19, 2021
5994751
Remove unused parameter from copy_partition kernel documentation (#8283)
robertmaynard May 19, 2021
b0dc972
add unit tests for lead/lag on list for row window (#8259)
wbo4958 May 19, 2021
2b9fc62
Fixes CSV-reader type inference for thousands separator and decimal p…
elstehle May 19, 2021
c732cef
Support create lists column from a `list_scalar` (#8185)
isVoid May 20, 2021
2da8473
Create a String column from UTF8 String byte arrays (#8257)
firestarman May 20, 2021
48647aa
Java: Support creating a scalar from utf8 string (#8294)
firestarman May 20, 2021
0ebf7e6
support RMM aligned resource adapter in JNI (#8266)
rongou May 20, 2021
deee1f6
update changelog (#8297)
ajschmidt8 May 20, 2021
7427049
Remove abc inheritance from Serializable (#8254)
vyasr May 20, 2021
944e932
Implement `lists::concatenate_list_elements` (#8231)
ttnghia May 20, 2021
75e12d1
Actually test equality in assert_groupby_results_equal (#8272)
shwina May 20, 2021
47c3572
Merge remote-tracking branch 'upstream/branch-0.19' into branch-21.06…
ajschmidt8 May 20, 2021
9e308de
Merge pull request #8302 from ajschmidt8/branch-21.06-merge-0.19
ajschmidt8 May 20, 2021
3975f10
Update `CHANGELOG.md` links for calver (#8303)
ajschmidt8 May 20, 2021
2a1075e
use address and length for GDS reads/writes (#8301)
rongou May 20, 2021
b553144
Return python lists for __getitem__ calls to list type series (#8265)
brandon-b-miller May 20, 2021
c7d0524
Copy nested types upon construction (#8244)
isVoid May 20, 2021
9a85b3b
Update cudfjni version to 21.06.0 (#8292)
pxLi May 21, 2021
b84c792
Fix concatenate_lists_ignore_null on rows of all_nulls (#8312)
sperlingxx May 21, 2021
6920f9b
Update readme with correct CUDA versions (#8315)
raydouglass May 21, 2021
5c6b92a
COLLECT_LIST support returning empty output columns. (#8279)
mythrocks May 21, 2021
de579a5
Added decimal writing for CSV writer (#8296)
kaatish May 21, 2021
696902d
Enable implicit casting when concatenating mixed types (#8276)
ChrisJar May 23, 2021
ef20706
Add separator-on-null parameter to strings concatenate APIs (#8282)
davidwendt May 24, 2021
b9588d1
JNI: Refactor the code of making column from scalar (#8310)
firestarman May 24, 2021
936b02d
Add description of the cuIO GDS integration (#8293)
vuule May 24, 2021
259d69b
Revert "patch thrust to fix intmax num elements limitation in scan_by…
cwharris May 24, 2021
3da0d12
added _is_homogeneous property (#8299)
shaneding May 24, 2021
63faf2f
Use empty_like in scatter (#8314)
revans2 May 24, 2021
e555643
Update environment variable used to determine `cuda_version` (#8321)
ajschmidt8 May 24, 2021
b1d7788
Update Java string concatenate test for single column (#8330)
tgravescs May 24, 2021
5c0a75b
Fix cudf release version in readme (#8331)
galipremsagar May 24, 2021
691dd11
Refactor of rolling_window implementation. (#8158)
nvdbaranec May 24, 2021
7e725b5
Do not add nulls to the hash table when null_equality::NOT_EQUAL is p…
nvdbaranec May 24, 2021
7eaf3d7
Preserve column hierarchy when getting NULL row from `LIST` column (#…
isVoid May 24, 2021
c398054
Support scattering `list_scalar` (#8256)
isVoid May 24, 2021
6dbf2d5
Add `groupby::replace_nulls(replace_policy)` api (#7118)
isVoid May 24, 2021
dd5eecd
Fix struct binary search and struct flattening (#8268)
ttnghia May 24, 2021
6db757b
Fix structs column description in dev docs (#8318)
isVoid May 25, 2021
eea8cab
upgrade dlpack to 0.5 (#8262)
cwharris May 25, 2021
2383193
Java: Support struct scalar (#8327)
sperlingxx May 26, 2021
cbbcba7
Add support for `make_meta_obj` dispatch in `dask-cudf` (#8342)
galipremsagar May 26, 2021
fa6e7e0
Make device_buffer streams explicit and enforce move construction (#8…
harrism May 26, 2021
e97fc1c
Add backward compatibility for `dask-cudf` to work with other version…
galipremsagar May 26, 2021
cfc7ef9
Introduce a common parent class for NumericalColumn and DecimalColumn…
vyasr May 26, 2021
96f8df7
Add aliases for string methods (#8353)
shwina May 26, 2021
2a19a31
Raise error when unsupported arguments are passed to `dask_cudf.DataF…
galipremsagar May 26, 2021
e598361
Raise `NotImplementedError` for axis=1 in `rank` (#8347)
galipremsagar May 26, 2021
ddba88d
Add docstring for `dask_cudf.read_csv` (#8355)
galipremsagar May 26, 2021
24e05a0
IO statistics cleanup (#8191)
kaatish May 26, 2021
cd7fe6f
`Groupby.shift` c++ API refactor and python binding (#8131)
isVoid May 26, 2021
773fc7a
`strings::join_list_elements` options for empty list inputs (#8285)
ttnghia May 26, 2021
be05a00
Support collect_set on rolling window (#7881)
sperlingxx May 26, 2021
a29c0e3
Add Java API for Concatenate strings with separator (#8289)
tgravescs May 26, 2021
50c0335
Compilation fix: Remove redefinition for `std::is_same_v()` (#8369)
mythrocks May 27, 2021
4d1a62e
Fix struct flattening to add a validity column only when the input co…
ttnghia May 27, 2021
3ee8893
Handle empty results with nested types in copy_if_else (#8359)
nvdbaranec May 27, 2021
bcb1237
Handle nested column types properly for empty parquet files. (#8350)
nvdbaranec May 27, 2021
f7ede6a
Add support merging b/w categorical data (#8332)
galipremsagar May 27, 2021
b9bc78e
support space in workspace (#7956)
jolorunyomi May 27, 2021
7231e3b
Clip decimal binary op precision at max precision (#8194)
ChrisJar May 27, 2021
0eeb0c9
Fix result column types for empty inputs to rolling window (#8274)
mythrocks May 28, 2021
e9fe3c5
Support Dask + Distributed 2021.05.1 (#8392)
jakirkham Jun 1, 2021
2e780d0
pin dask for ci
galipremsagar Jun 1, 2021
b616807
Merge pull request #8421 from galipremsagar/pin_dask
msadang Jun 1, 2021
854176b
Update UCX-Py version to 0.20 (#8446)
pentschev Jun 7, 2021
0c869bc
FIX update-version.sh for CalVer (#8469)
raydouglass Jun 9, 2021
85f04d0
update changelog
raydouglass Jun 9, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 3 additions & 2 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@ notebooks/ @rapidsai/cudf-python-codeowners
python/dask_cudf/ @rapidsai/cudf-dask-codeowners

#cmake code owners
**/CMakeLists.txt @rapidsai/cudf-cmake-codeowners
**/cmake/ @rapidsai/cudf-cmake-codeowners
cpp/CMakeLists.txt @rapidsai/cudf-cmake-codeowners
cpp/libcudf_kafka/CMakeLists.txt @rapidsai/cudf-cmake-codeowners
**/cmake/ @rapidsai/cudf-cmake-codeowners

#java code owners
java/ @rapidsai/cudf-java-codeowners
Expand Down
25 changes: 14 additions & 11 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,18 @@ Here are some guidelines to help the review process go smoothly.
noted here: https://help.github.com/articles/closing-issues-using-keywords/

5. If your pull request is not ready for review but you want to make use of the
continuous integration testing facilities please label it with `[WIP]`.
continuous integration testing facilities please mark your pull request as Draft.
https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request#converting-a-pull-request-to-a-draft

6. If your pull request is ready to be reviewed without requiring additional
work on top of it, then remove the `[WIP]` label (if present) and replace
it with `[REVIEW]`. If assistance is required to complete the functionality,
for example when the C/C++ code of a feature is complete but Python bindings
are still required, then add the label `[HELP-REQ]` so that others can triage
and assist. The additional changes then can be implemented on top of the
same PR. If the assistance is done by members of the rapidsAI team, then no
work on top of it, then remove it from "Draft" and make it "Ready for Review".
https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request#marking-a-pull-request-as-ready-for-review

If assistance is required to complete the functionality, for example when the
C/C++ code of a feature is complete but Python bindings are still required,
then add the label `help wanted` so that others can triage and assist.
The additional changes then can be implemented on top of the same PR.
If the assistance is done by members of the rapidsAI team, then no
additional actions are required by the creator of the original PR for this,
otherwise the original author of the PR needs to give permission to the
person(s) assisting to commit to their personal fork of the project. If that
Expand All @@ -39,10 +42,10 @@ Here are some guidelines to help the review process go smoothly.
features or make changes out of the scope of those requested by the reviewer
(doing this just add delays as already reviewed code ends up having to be
re-reviewed/it is hard to tell what is new etc!). Further, please do not
rebase your branch on main/force push/rewrite history, doing any of these
causes the context of any comments made by reviewers to be lost. If
conflicts occur against main they should be resolved by merging main
into the branch used for making the pull request.
rebase your branch on the target branch, force push, or rewrite history.
Doing any of these causes the context of any comments made by reviewers to be lost.
If conflicts occur against the target branch they should be resolved by
merging the target branch into the branch used for making the pull request.

Many thanks in advance for your cooperation!

Expand Down
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ repos:
entry: mypy --config-file=python/cudf/setup.cfg python/cudf/cudf
language: system
types: [python]
- repo: https://github.com/pycqa/pydocstyle
rev: 6.0.0
hooks:
- id: pydocstyle
args: ["--config=python/.flake8"]


default_language_version:
python: python3
417 changes: 414 additions & 3 deletions CHANGELOG.md

Large diffs are not rendered by default.

22 changes: 11 additions & 11 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,14 +131,14 @@ run each time you commit changes.

Compiler requirements:

* `gcc` version 7.1+
* `nvcc` version 10.1+
* `gcc` version 9.3+
* `nvcc` version 11.0+
* `cmake` version 3.18.0+

CUDA/GPU requirements:

* CUDA 10.1+
* NVIDIA driver 410.48+
* CUDA 11.0+
* NVIDIA driver 450.80.02+
* Pascal architecture or better

You can obtain CUDA from [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads).
Expand All @@ -160,7 +160,7 @@ git submodule update --init --remote --recursive
```bash
# create the conda environment (assuming in base `cudf` directory)
# note: RAPIDS currently doesn't support `channel_priority: strict`; use `channel_priority: flexible` instead
conda env create --name cudf_dev --file conda/environments/cudf_dev_cuda10.0.yml
conda env create --name cudf_dev --file conda/environments/cudf_dev_cuda11.0.yml
# activate the environment
conda activate cudf_dev
```
Expand Down Expand Up @@ -281,8 +281,8 @@ A Dockerfile is provided with a preconfigured conda environment for building and
### Prerequisites

* Install [nvidia-docker2](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)) for Docker + GPU support
* Verify NVIDIA driver is `410.48` or higher
* Ensure CUDA 10.0+ is installed
* Verify NVIDIA driver is `450.80.02` or higher
* Ensure CUDA 11.0+ is installed

### Usage

Expand All @@ -309,16 +309,16 @@ flag. Below is a list of the available arguments and their purpose:

| Build Argument | Default Value | Other Value(s) | Purpose |
| --- | --- | --- | --- |
| `CUDA_VERSION` | 10.0 | 10.1, 10.2 | set CUDA version |
| `LINUX_VERSION` | ubuntu16.04 | ubuntu18.04 | set Ubuntu version |
| `CC` & `CXX` | 5 | 7 | set gcc/g++ version; **NOTE:** gcc7 requires Ubuntu 18.04 |
| `CUDA_VERSION` | 11.0 | 11.2.2 | set CUDA version |
| `LINUX_VERSION` | ubuntu18.04 | ubuntu20.04 | set Ubuntu version |
| `CC` & `CXX` | 9 | 10 | set gcc/g++ version |
| `CUDF_REPO` | This repo | Forks of cuDF | set git URL to use for `git clone` |
| `CUDF_BRANCH` | main | Any branch name | set git branch to checkout of `CUDF_REPO` |
| `NUMBA_VERSION` | newest | >=0.40.0 | set numba version |
| `NUMPY_VERSION` | newest | >=1.14.3 | set numpy version |
| `PANDAS_VERSION` | newest | >=0.23.4 | set pandas version |
| `PYARROW_VERSION` | 1.0.1 | Not supported | set pyarrow version |
| `CMAKE_VERSION` | newest | >=3.14 | set cmake version |
| `CMAKE_VERSION` | newest | >=3.18 | set cmake version |
| `CYTHON_VERSION` | 0.29 | Not supported | set Cython version |
| `PYTHON_VERSION` | 3.7 | 3.8 | set python version |

Expand Down
35 changes: 14 additions & 21 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
# Copyright (c) 2021, NVIDIA CORPORATION.

# An integration test & dev container which builds and installs cuDF from main
ARG CUDA_VERSION=10.1
ARG CUDA_VERSION=11.0
ARG CUDA_SHORT_VERSION=${CUDA_VERSION}
ARG LINUX_VERSION=ubuntu16.04
ARG LINUX_VERSION=ubuntu18.04
FROM nvidia/cuda:${CUDA_VERSION}-devel-${LINUX_VERSION}
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/lib
# Needed for cudf.concat(), avoids "OSError: library nvvm not found"
ENV NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so
ENV NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/
ENV DEBIAN_FRONTEND=noninteractive

ARG CC=5
ARG CXX=5
ARG CC=9
ARG CXX=9
RUN apt update -y --fix-missing && \
apt upgrade -y && \
apt install -y --no-install-recommends software-properties-common && \
add-apt-repository ppa:ubuntu-toolchain-r/test && \
apt update -y --fix-missing && \
apt install -y --no-install-recommends \
git \
gcc-${CC} \
g++-${CXX} \
libboost-all-dev \
tzdata && \
apt-get autoremove -y && \
apt-get clean && \
Expand Down Expand Up @@ -66,18 +67,10 @@ RUN if [ -f /cudf/docker/package_versions.sh ]; \
conda env create --name cudf --file /cudf/conda/environments/cudf_dev_cuda${CUDA_SHORT_VERSION}.yml ; \
fi

# libcudf build/install
ENV CC=/usr/bin/gcc-${CC}
ENV CXX=/usr/bin/g++-${CXX}
RUN source activate cudf && \
mkdir -p /cudf/cpp/build && \
cd /cudf/cpp/build && \
cmake .. -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} && \
make -j"$(nproc)" install
ENV CC=/opts/conda/envs/rapids/bin/gcc-${CC}
ENV CXX=/opts/conda/envs/rapids/bin/g++-${CXX}

# cuDF build/install
# libcudf & cudf build/install
RUN source activate cudf && \
cd /cudf/python/cudf && \
python setup.py build_ext --inplace && \
python setup.py install && \
python setup.py install
cd /cudf/ && \
./build.sh libcudf cudf
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,15 +57,15 @@ Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapids

### CUDA/GPU requirements

* CUDA 10.1+
* NVIDIA driver 418.39+
* CUDA 11.0+
* NVIDIA driver 450.80.02+
* Pascal architecture or better (Compute Capability >=6.0)

### Conda

cuDF can be installed with conda ([miniconda](https://conda.io/miniconda.html), or the full [Anaconda distribution](https://www.anaconda.com/download)) from the `rapidsai` channel:

For `cudf version == 0.19` :
For `cudf version == 0.19.2` :
```bash
# for CUDA 10.1
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
Expand All @@ -79,13 +79,13 @@ conda install -c rapidsai -c nvidia -c numba -c conda-forge \

For the nightly version of `cudf` :
```bash
# for CUDA 10.1
# for CUDA 11.0
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
cudf python=3.7 cudatoolkit=10.1
cudf python=3.7 cudatoolkit=11.0

# or, for CUDA 10.2
# or, for CUDA 11.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
cudf python=3.7 cudatoolkit=10.2
cudf python=3.7 cudatoolkit=11.2
```

Note: cuDF is supported only on Linux, and with Python versions 3.7 and later.
Expand Down
30 changes: 15 additions & 15 deletions ci/benchmark/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ function hasArg {
export PATH=/conda/bin:/usr/local/cuda/bin:$PATH
export PARALLEL_LEVEL=4
export CUDA_REL=${CUDA_VERSION%.*}
export HOME=$WORKSPACE
export HOME="$WORKSPACE"

# Parse git describe
cd $WORKSPACE
cd "$WORKSPACE"
export GIT_DESCRIBE_TAG=`git describe --tags`
export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`

# Set Benchmark Vars
export GBENCH_BENCHMARKS_DIR=${WORKSPACE}/cpp/build/gbenchmarks/
export GBENCH_BENCHMARKS_DIR="$WORKSPACE/cpp/build/gbenchmarks/"

# Set `LIBCUDF_KERNEL_CACHE_PATH` environment variable to $HOME/.jitify-cache because
# it's local to the container's virtual file system, and not shared with other CI jobs
Expand Down Expand Up @@ -77,8 +77,8 @@ conda install "rmm=$MINOR_VERSION.*" "cudatoolkit=$CUDA_REL" \
# Install the master version of dask, distributed, and streamz
logger "pip install git+https://github.com/dask/distributed.git@main --upgrade --no-deps"
pip install "git+https://github.com/dask/distributed.git@main" --upgrade --no-deps
logger "pip install git+https://github.com/dask/dask.git@main --upgrade --no-deps"
pip install "git+https://github.com/dask/dask.git@main" --upgrade --no-deps
logger "pip install git+https://github.com/dask/dask.git@2021.05.1 --upgrade --no-deps"
pip install "git+https://github.com/dask/dask.git@2021.05.1" --upgrade --no-deps
logger "pip install git+https://github.com/python-streamz/streamz.git --upgrade --no-deps"
pip install "git+https://github.com/python-streamz/streamz.git" --upgrade --no-deps

Expand All @@ -96,9 +96,9 @@ conda list --show-channel-urls

logger "Build libcudf..."
if [[ ${BUILD_MODE} == "pull-request" ]]; then
$WORKSPACE/build.sh clean libcudf cudf dask_cudf benchmarks tests --ptds
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf benchmarks tests --ptds
else
$WORKSPACE/build.sh clean libcudf cudf dask_cudf benchmarks tests -l --ptds
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf benchmarks tests -l --ptds
fi

################################################################################
Expand Down Expand Up @@ -144,9 +144,9 @@ function getReqs() {

REQS=$(getReqs "${LIBCUDF_DEPS[@]}")

mkdir -p ${WORKSPACE}/tmp/benchmark
touch ${WORKSPACE}/tmp/benchmark/benchmarks.txt
ls ${GBENCH_BENCHMARKS_DIR} > ${WORKSPACE}/tmp/benchmark/benchmarks.txt
mkdir -p "$WORKSPACE/tmp/benchmark"
touch "$WORKSPACE/tmp/benchmark/benchmarks.txt"
ls ${GBENCH_BENCHMARKS_DIR} > "$WORKSPACE/tmp/benchmark/benchmarks.txt"

#Disable error aborting while tests run, failed tests will not generate data
logger "Running libcudf GBenchmarks..."
Expand All @@ -161,13 +161,13 @@ do
rm ./${BENCH}.json
JOBEXITCODE=1
fi
done < ${WORKSPACE}/tmp/benchmark/benchmarks.txt
done < "$WORKSPACE/tmp/benchmark/benchmarks.txt"
set -e

rm ${WORKSPACE}/tmp/benchmark/benchmarks.txt
cd ${WORKSPACE}
mv ${GBENCH_BENCHMARKS_DIR}/*.json ${WORKSPACE}/tmp/benchmark/
python GBenchToASV.py -d ${WORKSPACE}/tmp/benchmark/ -t ${S3_ASV_DIR} -n libcudf -b branch-${MINOR_VERSION} -r "${REQS}"
rm "$WORKSPACE/tmp/benchmark/benchmarks.txt"
cd "$WORKSPACE"
mv ${GBENCH_BENCHMARKS_DIR}/*.json "$WORKSPACE/tmp/benchmark/"
python GBenchToASV.py -d "$WORKSPACE/tmp/benchmark/" -t ${S3_ASV_DIR} -n libcudf -b branch-${MINOR_VERSION} -r "${REQS}"

###
# Run Python Benchmarks
Expand Down
16 changes: 14 additions & 2 deletions ci/checks/style.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2018, NVIDIA CORPORATION.
# Copyright (c) 2018-2021, NVIDIA CORPORATION.
#####################
# cuDF Style Tester #
#####################
Expand Down Expand Up @@ -33,6 +33,10 @@ FLAKE_CYTHON_RETVAL=$?
MYPY_CUDF=`mypy --config=python/cudf/setup.cfg python/cudf/cudf`
MYPY_CUDF_RETVAL=$?

# Run pydocstyle and get results/return code
PYDOCSTYLE=`pydocstyle --config=python/.flake8 python`
PYDOCSTYLE_RETVAL=$?

# Run clang-format and check for a consistent code format
CLANG_FORMAT=`python cpp/scripts/run-clang-format.py 2>&1`
CLANG_FORMAT_RETVAL=$?
Expand Down Expand Up @@ -78,6 +82,14 @@ else
echo -e "\n\n>>>> PASSED: mypy style check\n\n"
fi

if [ "$PYDOCSTYLE_RETVAL" != "0" ]; then
echo -e "\n\n>>>> FAILED: pydocstyle style check; begin output\n\n"
echo -e "$PYDOCSTYLE"
echo -e "\n\n>>>> FAILED: pydocstyle style check; end output\n\n"
else
echo -e "\n\n>>>> PASSED: pydocstyle style check\n\n"
fi

if [ "$CLANG_FORMAT_RETVAL" != "0" ]; then
echo -e "\n\n>>>> FAILED: clang format check; begin output\n\n"
echo -e "$CLANG_FORMAT"
Expand All @@ -91,7 +103,7 @@ HEADER_META=`ci/checks/headers_test.sh`
HEADER_META_RETVAL=$?
echo -e "$HEADER_META"

RETVALS=($ISORT_RETVAL $BLACK_RETVAL $FLAKE_RETVAL $FLAKE_CYTHON_RETVAL $CLANG_FORMAT_RETVAL $HEADER_META_RETVAL $MYPY_CUDF_RETVAL)
RETVALS=($ISORT_RETVAL $BLACK_RETVAL $FLAKE_RETVAL $FLAKE_CYTHON_RETVAL $PYDOCSTYLE_RETVAL $CLANG_FORMAT_RETVAL $HEADER_META_RETVAL $MYPY_CUDF_RETVAL)
IFS=$'\n'
RETVAL=`echo "${RETVALS[*]}" | sort -nr | head -n1`

Expand Down
11 changes: 8 additions & 3 deletions ci/cpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ export PATH=/opt/conda/bin:/usr/local/cuda/bin:$PATH
export PARALLEL_LEVEL=${PARALLEL_LEVEL:-4}

# Set home to the job's workspace
export HOME=$WORKSPACE
export HOME="$WORKSPACE"

# Determine CUDA release version
export CUDA_REL=${CUDA_VERSION%.*}
Expand All @@ -21,10 +21,10 @@ export GPUCI_CONDA_RETRY_SLEEP=30

# Use Ninja to build, setup Conda Build Dir
export CMAKE_GENERATOR="Ninja"
export CONDA_BLD_DIR="${WORKSPACE}/.conda-bld"
export CONDA_BLD_DIR="$WORKSPACE/.conda-bld"

# Switch to project root; also root of repo checkout
cd $WORKSPACE
cd "$WORKSPACE"

# If nightly build, append current YYMMDD to version
if [[ "$BUILD_MODE" = "branch" && "$SOURCE_BRANCH" = branch-* ]] ; then
Expand All @@ -42,6 +42,11 @@ gpuci_logger "Activate conda env"
. /opt/conda/etc/profile.d/conda.sh
conda activate rapids

# Remove rapidsai-nightly channel if we are building main branch
if [ "$SOURCE_BRANCH" = "main" ]; then
conda config --system --remove channels rapidsai-nightly
fi

gpuci_logger "Check compiler versions"
python --version
$CC --version
Expand Down
4 changes: 2 additions & 2 deletions ci/cpu/prebuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ else
fi

# upload cudf_kafka for all versions of Python
if [[ "$CUDA" == "10.1" ]]; then
if [[ "$CUDA" == "11.0" ]]; then
export UPLOAD_CUDF_KAFKA=1
else
export UPLOAD_CUDF_KAFKA=0
fi

#We only want to upload libcudf_kafka once per python/CUDA combo
if [[ "$PYTHON" == "3.7" ]] && [[ "$CUDA" == "10.1" ]]; then
if [[ "$PYTHON" == "3.7" ]] && [[ "$CUDA" == "11.0" ]]; then
export UPLOAD_LIBCUDF_KAFKA=1
else
export UPLOAD_LIBCUDF_KAFKA=0
Expand Down
4 changes: 2 additions & 2 deletions ci/cpu/upload.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ fi

gpuci_logger "Get conda file output locations"

export LIBCUDF_FILE=`conda build --no-build-id --croot ${WORKSPACE}/.conda-bld conda/recipes/libcudf --output`
export LIBCUDF_KAFKA_FILE=`conda build --no-build-id --croot ${WORKSPACE}/.conda-bld conda/recipes/libcudf_kafka --output`
export LIBCUDF_FILE=`conda build --no-build-id --croot "$WORKSPACE/.conda-bld" conda/recipes/libcudf --output`
export LIBCUDF_KAFKA_FILE=`conda build --no-build-id --croot "$WORKSPACE/.conda-bld" conda/recipes/libcudf_kafka --output`
export CUDF_FILE=`conda build --croot ${CONDA_BLD_DIR} conda/recipes/cudf --python=$PYTHON --output`
export DASK_CUDF_FILE=`conda build --croot ${CONDA_BLD_DIR} conda/recipes/dask-cudf --python=$PYTHON --output`
export CUDF_KAFKA_FILE=`conda build --croot ${CONDA_BLD_DIR} conda/recipes/cudf_kafka --python=$PYTHON --output`
Expand Down
Loading