Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v22.04 #10512

Merged
merged 262 commits into from
Apr 6, 2022
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
262 commits
Select commit Hold shift + click to select a range
57ff6f5
Merge pull request #10099 from rapidsai/branch-22.02
GPUtester Jan 20, 2022
f1e0bb6
Merge pull request #10106 from rapidsai/branch-22.02
GPUtester Jan 21, 2022
fd968f3
Update cudf java binding version to 22.04.0-SNAPSHOT (#10084)
pxLi Jan 24, 2022
ea905d7
Merge pull request #10110 from rapidsai/branch-22.02
GPUtester Jan 24, 2022
01b7c83
Update benchmarking guide to use NVBench. (#10093)
bdice Jan 24, 2022
cfb6cbe
Encode values from python callback for C++ (#10103)
jdye64 Jan 24, 2022
6a77acc
Remove unnecessary docker files. (#10069)
vyasr Jan 24, 2022
6d11823
Fix flaky memory usage test by guaranteeing array size. (#10114)
vyasr Jan 24, 2022
b8d2969
Implement mixed equality/conditional semi/anti joins (#10037)
vyasr Jan 24, 2022
a552afb
Limit benchmark iterations using environment variable (#10060)
karthikeyann Jan 25, 2022
52a61b7
Java bindings for mixed semi and anti joins (#10040)
jlowe Jan 25, 2022
127537e
Avoid `nan_as_null` op if `nan_count` is 0 (#10082)
galipremsagar Jan 25, 2022
baff5cf
Remove benchmarks suffix (#10112)
bdice Jan 25, 2022
22daaea
Reduce redundant code in CUDF JNI (#10019)
mythrocks Jan 25, 2022
83ec0af
Remove deprecated method Series.set_index. (#9945)
bdice Jan 26, 2022
eacaea0
Merge pull request #10135 from rapidsai/branch-22.02
GPUtester Jan 26, 2022
85109e6
Remove metadata singleton from nvtext normalizer (#10090)
davidwendt Jan 26, 2022
3e2474d
Fix UDF Caching (#10133)
brandon-b-miller Jan 26, 2022
3265531
Update cmake-format script for branch 22.04. (#10132)
bdice Jan 26, 2022
05d53cf
Update gpu_utils.py to reflect current CUDA support. (#10113)
bdice Jan 26, 2022
d19d683
Remove the option to completely disable decimal128 columns in the ORC…
vuule Jan 26, 2022
dcc0bf5
Raise duplicate column error in `DataFrame.rename` (#10120)
galipremsagar Jan 27, 2022
53a73f7
Remove `drop_nan` from internal `IndexedFrame._drop_na_rows`. (#10140)
bdice Jan 27, 2022
b684f17
Deprecate `decimal_cols_as_float` in ORC reader (#10142)
galipremsagar Jan 27, 2022
b290ec7
Add `maxSplit` parameter to Java binding for `strings:split` (#10137)
ttnghia Jan 27, 2022
57ac8c4
Add cudf::strings::findall_record API (#9911)
davidwendt Jan 27, 2022
1246116
Merge pull request #10148 from rapidsai/branch-22.02
GPUtester Jan 27, 2022
5dd1c39
benchmark fixture - static object pointer fix (#10145)
karthikeyann Jan 27, 2022
7c69dae
Accept r-value references in convert_table_for_return(): (#10131)
mythrocks Jan 27, 2022
b2d5874
Add check for regex instructions causing an infinite-loop (#10095)
davidwendt Jan 27, 2022
5257f34
Deprecate `decimal_cols_as_float` in ORC reader (C++ layer) (#10152)
vuule Jan 28, 2022
896564a
Support `args=` in `Series.apply` (#9982)
brandon-b-miller Jan 28, 2022
e2123db
Update developer guide to recommend no default stream parameter. (#10…
bdice Jan 28, 2022
b7aa47f
Remove deprecated code (#10124)
vyasr Jan 28, 2022
eb71b1f
Change cudf::strings::find_multiple to return a lists column (#10134)
davidwendt Jan 28, 2022
cf81b1a
Add file size counter to cuIO benchmarks (#10154)
vuule Jan 29, 2022
6e500d1
Add timing chart for libcudf build metrics report page (#10038)
davidwendt Jan 31, 2022
b217d7e
JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder (…
sperlingxx Jan 31, 2022
c25d35b
Preserve the correct `ListDtype` while creating an identical empty co…
galipremsagar Jan 31, 2022
c52f483
Add comments to explain test validation (#10176)
galipremsagar Jan 31, 2022
2c6b0da
Refactor groupby::get_groups. (#10161)
bdice Jan 31, 2022
8d2a9cc
Fix JNI leak of a cudf::column_view native class. (#10171)
revans2 Feb 1, 2022
12f61f6
Merge branch 'branch-22.02' into branch-22.04-merge-22.02
bdice Feb 2, 2022
a080a4c
Remove cleaned up methods from docs (#10189)
galipremsagar Feb 2, 2022
b6bb463
Optimize compaction operations (#10030)
PointKernel Feb 2, 2022
deb902a
Replace `dask` groupby `.index` usages with `.by` (#10193)
galipremsagar Feb 2, 2022
83accc6
Skip ORC and Parquet readers' benchmark cases that are not currently …
vuule Feb 2, 2022
b7257a3
Add Dataframe and Index nunique (#10077)
martinfalisse Feb 2, 2022
7d88a87
Fix `fixed_point` binary operation documentation (#10198)
codereport Feb 3, 2022
a10d24a
Fix cuco pair issue in hash join (#10195)
PointKernel Feb 3, 2022
a25a2ec
Fix docstrings alignment in `Frame` methods (#10199)
galipremsagar Feb 3, 2022
62ecfab
Refactor DataFrame tests. (#10204)
bdice Feb 3, 2022
511aa28
Merge pull request #10191 from bdice/branch-22.04-merge-22.02
ajschmidt8 Feb 3, 2022
f4ac6d4
Support for `MOD`, `PMOD` and `PYMOD` for `decimal32/64/128` (#10179)
codereport Feb 3, 2022
4e97835
Update dask-cudf parquet tests to reflect upstream bugfixes to `_meta…
charlesbluca Feb 3, 2022
4e58850
Replace `ccache` with `sccache` (#10146)
ajschmidt8 Feb 3, 2022
8fbd797
Remove unnecessary nunique function in Series. (#10205)
martinfalisse Feb 3, 2022
0581975
Fix compile error in `binaryop/compiled/util.cpp` (#10209)
ttnghia Feb 3, 2022
1754731
Some consolidation of indexed frame methods (#10167)
vyasr Feb 4, 2022
42d86c4
Run pyupgrade 2.31.0. (#10141)
bdice Feb 4, 2022
b72c79d
Fix bitmask of the output for JNI of `lists::drop_list_duplicates` (#…
ttnghia Feb 4, 2022
c191d16
Fix docs builds (#10216)
ajschmidt8 Feb 4, 2022
4e8cb4f
Java utilities to aid in accelerating aggregations on 128-bit types (…
jlowe Feb 4, 2022
5024a0a
Fix a leftover _has_nulls change from Nullate (#10211)
devavret Feb 4, 2022
f654c4a
Reduce warnings in pytest output (#10168)
bdice Feb 4, 2022
84ae8ab
Add environment variables for I/O thread pool and slice sizes (#10218)
vuule Feb 4, 2022
e5ba292
Fix `decimal` metadata in parquet writer (#10224)
galipremsagar Feb 4, 2022
2e458b9
Implement DataFrame diff() (#9817)
skirui-source Feb 5, 2022
7987791
Fix JNI leak on copy to device (#10229)
revans2 Feb 7, 2022
e3611a2
Add CMake `install` rule for tests (#10190)
ajschmidt8 Feb 7, 2022
9c85584
Change pytest distribution algorithm and increase parallelism in CI (…
galipremsagar Feb 7, 2022
8a88490
Optimize `DataFrame` creation across code-base (#10236)
galipremsagar Feb 7, 2022
8014add
Murmur3 hash kernel cleanup (#10143)
rwlee Feb 7, 2022
6502cea
Fix the data generator element size for decimal types (#10225)
vuule Feb 8, 2022
acb6aed
Column equality testing fixes (#10011)
brandon-b-miller Feb 8, 2022
8af4e84
Yet another small JNI memory leak (#10238)
revans2 Feb 8, 2022
1bc3727
Fix regex octal parsing to limit to 3 characters (#10233)
davidwendt Feb 8, 2022
4a50a63
Fix small leak in explode (#10245)
revans2 Feb 8, 2022
dad51a5
update changelog
ajschmidt8 Feb 8, 2022
adcb2d3
Merge pull request #10247 from rapidsai/branch-22.02
GPUtester Feb 8, 2022
10faad9
fix changelog
ajschmidt8 Feb 8, 2022
bd98bfe
Add regex flags to strings findall functions (#10208)
davidwendt Feb 8, 2022
fff51b8
Fix strings handling of hex in regex pattern (#10220)
davidwendt Feb 8, 2022
6e267bd
Refactor Series.__array_ufunc__ (#10217)
vyasr Feb 8, 2022
3fe168d
Fix string to decimal128 conversion handling large exponents (#10231)
davidwendt Feb 9, 2022
19dc46f
Upgrade `arrow` & `pyarrow` to `6.0.1` (#9686)
galipremsagar Feb 9, 2022
37acb9b
Remove redundant copies in `fillna` to improve performance (#10241)
galipremsagar Feb 10, 2022
927ebcb
Add `copyright` check in `cudf` (#10253)
galipremsagar Feb 10, 2022
eb5e3e3
Remove `std::numeric_limit` specializations for timestamp & durations…
codereport Feb 10, 2022
6d162b4
Bump hadoop-common from 3.1.0 to 3.1.4 in /java (#10259)
dependabot[bot] Feb 10, 2022
2741e6b
Remove probe-time null equality parameters in `cudf::hash_join` (#10260)
PointKernel Feb 10, 2022
8f0f6f8
Support `percent_rank()` aggregation (#10227)
mythrocks Feb 10, 2022
d5b2448
Add regex flags to strings extract function (#10192)
davidwendt Feb 10, 2022
dcac052
Adding string row size iterator for row to column and column to row c…
hyperbolic2346 Feb 11, 2022
8cc84c6
Add libcudf strings split API that accepts regex pattern (#10128)
davidwendt Feb 11, 2022
4c92f86
Add tests of reflected ufuncs and fix behavior of logical reflected u…
vyasr Feb 11, 2022
48c4dc3
Add more `nvtx` annotations (#10256)
galipremsagar Feb 11, 2022
317553f
Remove making redundant `copy` across code-base (#10257)
galipremsagar Feb 12, 2022
7f2a16f
Improve hash join detail functions (#10273)
PointKernel Feb 12, 2022
b977d1e
Fix out-of-memory error in UrlDecode benchmark (#10258)
davidwendt Feb 14, 2022
463266f
Fix out-of-memory error in compiled-binaryop benchmark (#10269)
davidwendt Feb 14, 2022
7025c40
Add JNI for `strings::split_re` and `strings::split_record_re` (#10139)
ttnghia Feb 14, 2022
c21cca9
Fix groupby reductions that perform operations on source type instead…
ttnghia Feb 14, 2022
c2846fb
Fix incorrect slicing of GDS read/write calls (#10274)
vuule Feb 14, 2022
374b387
Replace custom `cached_property` implementation with functools (#10272)
shwina Feb 14, 2022
a443dd1
Convert Column Name to String Before Using Struct Column Factory (#10…
isVoid Feb 14, 2022
f5ae74f
Allow Java bindings to use default decimal precisions when writing co…
sperlingxx Feb 15, 2022
17b7907
Add copyright check as pre-commit hook. (#10290)
vyasr Feb 15, 2022
8b0737d
Enable numpy ufuncs for DataFrame (#10287)
vyasr Feb 15, 2022
851e235
Reduce pytest runtime (#10203)
brandon-b-miller Feb 15, 2022
ea2508e
Deprecate `DataFrame.iteritems` and introduce `.items` (#10298)
galipremsagar Feb 15, 2022
7a620c4
generate url decode benchmark input in device (#10278)
karthikeyann Feb 15, 2022
f263820
move input generation for type dispatcher benchmark to device (#10280)
karthikeyann Feb 15, 2022
9eb6a66
Explicitly request CMake use `gnu++17` over `c++17` (#10297)
robertmaynard Feb 16, 2022
203f7b0
DataFrame `insert` and creation optimizations (#10285)
galipremsagar Feb 16, 2022
dffed18
device input generation in join bench (#10277)
karthikeyann Feb 16, 2022
4474d9e
Fix documentation issues (#10306)
ajschmidt8 Feb 16, 2022
26d2924
Fix documentation issues (#10307)
ajschmidt8 Feb 16, 2022
183eec3
Remove `TODO` in `libcudf_kafka` recipe (#10309)
ajschmidt8 Feb 16, 2022
895b007
Prevent internal usage of expensive APIs (#10263)
vyasr Feb 16, 2022
fe37c0e
Add covariance for sort groupby (python) (#9889)
mayankanand007 Feb 17, 2022
f5ec4b2
Implement DataFrame pct_change (#9805)
skirui-source Feb 17, 2022
4e986fd
Remove extraneous `build.sh` parameter (#10313)
ajschmidt8 Feb 17, 2022
b3342a8
move input generation for copy benchmark to device (#10279)
karthikeyann Feb 17, 2022
28813d7
Update upload script (#10321)
ajschmidt8 Feb 17, 2022
fdad597
Add const qualifier to MurmurHash3_32::hash_combine (#10311)
davidwendt Feb 17, 2022
d48dd6f
Add conversions between column_view and device_span<T const>. (#10302)
bdice Feb 17, 2022
b28bad6
Hide warnings from pandas in test_array_ufunc.py. (#10324)
bdice Feb 17, 2022
ec614ac
Move hash type declarations to hashing.hpp (#10320)
davidwendt Feb 18, 2022
a362c65
multibyte_split test improvements (#10328)
vuule Feb 18, 2022
858ab83
Refactor isin implementations (#10165)
vyasr Feb 18, 2022
8a50a22
Add regex flags parameter to python cudf strings split (#10185)
davidwendt Feb 19, 2022
527d4ee
Fix debug compile error in device_span to column_view conversion (#10…
davidwendt Feb 19, 2022
4d262ae
Add Pascal support to JCUDF transcode (row_conversion) (#10329)
mythrocks Feb 21, 2022
7a17f28
Fixes up the overflowed fixed-point round on nullable column (#10316)
sperlingxx Feb 22, 2022
36e8825
JNI: Push back decimal utils from spark-rapids (#9907)
sperlingxx Feb 22, 2022
58810af
Fix DataFrame slicing issues for empty cases (#10310)
brandon-b-miller Feb 22, 2022
add6990
Add JNI for extract_list_element with index column (#10341)
firestarman Feb 22, 2022
cf65ac3
JNI: Support appending DECIMAL128 into ColumnBuilder in terms of byte…
sperlingxx Feb 22, 2022
c163886
C++17 cleanup: traits replace `::value` with `_v` (#10319)
karthikeyann Feb 22, 2022
0ae9dc6
Enable caching for `memory_usage` calculation in `Column` (#10345)
galipremsagar Feb 23, 2022
496f452
Avoid `decimal` type narrowing for decimal binops (#10299)
galipremsagar Feb 23, 2022
a72479f
Rewrites `column.__setitem__`, Use `boolean_mask_scatter` (#10202)
isVoid Feb 23, 2022
2a2e3f0
Fix warnings in test_binops.py. (#10327)
bdice Feb 23, 2022
aa746ae
Remove internal columns usage (#10315)
vyasr Feb 24, 2022
3a1dbe8
Enable proper `Index` round-tripping in `orc` reader and writer (#10170)
galipremsagar Feb 24, 2022
df646b2
Fix `std::bad_alloc` exception due to JIT reserving a huge buffer (#1…
ttnghia Feb 25, 2022
eaae94b
Add device create_sequence_table for benchmarks (#10300)
karthikeyann Feb 25, 2022
044922d
Add cleanup of python artifacts (#10355)
galipremsagar Feb 25, 2022
3f175ce
Fix warnings in test_categorical.py. (#10354)
bdice Feb 25, 2022
e0af727
Refactor array_ufunc for Index and unify across all classes (#10346)
vyasr Feb 25, 2022
21325e8
Implement a mixin for reductions (#9925)
vyasr Feb 25, 2022
3e33453
move input generation for json benchmark to device (#10281)
karthikeyann Feb 26, 2022
4c9ef51
C++17 cleanup: traits replace std::enable_if<>::type with std::enable…
karthikeyann Feb 26, 2022
619b2c7
Remove documentation for methods removed in #10124. (#10366)
bdice Feb 28, 2022
64ee514
Avoid overflow in fused_concatenate_kernel output_index (#10344)
abellina Feb 28, 2022
87a2ea4
Remove doc for deprecated function `one_hot_encoding` (#10367)
isVoid Feb 28, 2022
5d8ea19
Fix warnings in test_csv.py. (#10362)
bdice Mar 1, 2022
78b316c
byte_range support for multibyte_split/read_text (#10150)
cwharris Mar 1, 2022
1217f24
Create a dispatcher for invoking regex kernel functions (#10349)
davidwendt Mar 2, 2022
7120694
Implement a mixin for binops (#10360)
vyasr Mar 2, 2022
b4d262d
Refactor array function (#10364)
vyasr Mar 2, 2022
fbac0ac
Include <optional> in multibyte split. (#10385)
bdice Mar 2, 2022
6bcfc10
Fix issue with column and scalar re-assignment (#10377)
galipremsagar Mar 2, 2022
b5337d7
Add `nvtx` annotations for `Series` and `Index` (#10374)
galipremsagar Mar 3, 2022
1e5b01f
Rewrites `sample` API (#10262)
isVoid Mar 4, 2022
e610108
Consolidate some Frame APIs (#10381)
vyasr Mar 7, 2022
b782281
Implement a mixin for scans (#10358)
vyasr Mar 7, 2022
a584cdc
Support `min` and `max` operations for structs in rolling window (#10…
ttnghia Mar 7, 2022
4f8c60a
Add `cudf::stable_sort_by_key` (#10387)
PointKernel Mar 7, 2022
7d67093
Fix floating point data generation in benchmarks (#10372)
vuule Mar 7, 2022
5207eff
Support Java bindings for Avro reader (#10373)
HaoYang670 Mar 8, 2022
a9b6cb1
Fix `codecov` in CI (#10347)
galipremsagar Mar 8, 2022
a999ba9
Move standalone UTF8 functions from string_view.hpp to utf8.hpp (#10369)
davidwendt Mar 8, 2022
a4f2e10
Limiting async allocator using alignment of 512 (#10395)
rongou Mar 8, 2022
e9876cf
Pin libcudf runtime dependency for cudf / libcudf-kafka nightlies (#9…
charlesbluca Mar 8, 2022
555fb63
Add `assert_column_memory_*` (#9882)
isVoid Mar 8, 2022
600b872
Refactor hash function templates and `hash_combine` (#10379)
bdice Mar 8, 2022
b3dc9d6
Refactor cython interface: `copying.pyx` (#10359)
isVoid Mar 8, 2022
78eda63
Fix error thrown in compiled-binaryop benchmark (#10398)
davidwendt Mar 9, 2022
14bd5f6
Fix warnings in test_cuda_apply, test_numerical, test_pickling, test_…
bdice Mar 9, 2022
70406de
Refactor `nvtx` annotations in `cudf` & `dask-cudf` (#10396)
galipremsagar Mar 9, 2022
45acfc4
Fix warnings in `test_rolling` (#10405)
bdice Mar 9, 2022
c08a97e
Fix lifespan of the temporary directory that holds cuFile configurati…
vuule Mar 9, 2022
46ac622
Support collect aggregations in reduction (#10353)
sperlingxx Mar 10, 2022
289a6a1
Set column names in `_from_columns_like_self` factory (#10400)
isVoid Mar 10, 2022
dbe7b4f
Enable `codecov` github-check in CI (#10404)
galipremsagar Mar 10, 2022
7ff1956
Support segmented reductions and null mask reductions (#9621)
isVoid Mar 10, 2022
b613394
Use str instead of builtins.str. (#10410)
bdice Mar 11, 2022
cbc4d8b
Remove is_relationally_comparable for table device views (#10342)
davidwendt Mar 11, 2022
c0f7fe6
Drop unsupported method argument from nunique and distinct_count. (#1…
bdice Mar 11, 2022
f29c8d9
Fix some warnings in `test_parquet.py` (#10416)
galipremsagar Mar 11, 2022
b1ea304
Add scan_aggregation and reduce_aggregation derived types. (#10357)
nvdbaranec Mar 11, 2022
2007480
Remove warnings in `test_timedelta.py` (#10418)
galipremsagar Mar 12, 2022
9304ee6
Consolidate .cov and .corr for sort groupby (#10386)
skirui-source Mar 12, 2022
da55f6a
Clean up null mask after purging null entries (#10412)
sperlingxx Mar 12, 2022
0be0b00
Refactor stream compaction APIs (#10370)
PointKernel Mar 12, 2022
a066e7f
JNI support for segmented reduce (#10413)
revans2 Mar 14, 2022
749295d
Add `.github/ops-bot.yaml` config file (#10420)
ajschmidt8 Mar 14, 2022
cf936b6
Centralization of tdigest aggregation code. (#10422)
nvdbaranec Mar 14, 2022
61772d8
Fix benchmarks to work with new aggregation types (#10428)
davidwendt Mar 14, 2022
a6fe301
Unpin `dask` & `distributed` (#10182)
galipremsagar Mar 14, 2022
228cc79
Implement `maps_column_view` abstraction over `LIST<STRUCT<K,V>>` (#1…
mythrocks Mar 14, 2022
4596244
JNI support for Collect Ops in Reduction (#10427)
sperlingxx Mar 15, 2022
deb39db
Fix error in `cudf.to_numeric` when a `bool` input is passed (#10431)
galipremsagar Mar 15, 2022
1649955
Fix `list` and `struct` meta generation issue in `dask-cudf` (#10434)
galipremsagar Mar 16, 2022
d4ce5d5
Update dask_cudf imports to be compatible with latest dask (#10442)
rlratzel Mar 16, 2022
5d3f7dc
Fix for integer overflow in contiguous-split (#10437)
jbrennan333 Mar 16, 2022
87180ce
Fix has_null predicate for drop_list_duplicates on nested structs (#1…
sperlingxx Mar 17, 2022
94e9f58
Fix empty reduce with List output and non-List input (#10435)
sperlingxx Mar 17, 2022
04933a2
MD5 refactoring. (#10445)
bdice Mar 17, 2022
22a9f35
Simplify column binary operations (#10421)
vyasr Mar 17, 2022
9a60671
Fix cudf::shift to handle offset greater than column size (#10414)
davidwendt Mar 17, 2022
621d26f
Add nvtext::byte_pair_encoding API (#10270)
davidwendt Mar 17, 2022
47d16cb
Refactor `filling.repeat` API (#10371)
isVoid Mar 18, 2022
48cebf7
Add CUDF_UNREACHABLE macro. (#9727)
bdice Mar 18, 2022
21ed251
Use list of columns for methods in `Groupby.pyx` (#10419)
isVoid Mar 18, 2022
40baeb4
Support nanosecond timestamps in parquet (#10063)
PointKernel Mar 21, 2022
2426faf
Remove or split up Frame methods that use the index (#10439)
vyasr Mar 21, 2022
0ffd718
Support cupy array in `quantile` input (#10429)
galipremsagar Mar 21, 2022
037fe87
Add support for tdigest and merge_tdigest aggregations through cudf::…
nvdbaranec Mar 21, 2022
4ee78fb
Make snappy decompress check more efficient (#9995)
cheinger Mar 21, 2022
4300ba4
Enable read_text with dask_cudf using byte_range (#10407)
ChrisJar Mar 22, 2022
76c772e
generate benchmark input in device (#10109)
karthikeyann Mar 22, 2022
e7dba35
Faster struct row comparator (#10164)
devavret Mar 22, 2022
d386d26
Add `cut` to API docs (#10479)
shwina Mar 22, 2022
0a2aa98
Include <cstddef> to fix compilation of parquet reader on GCC 11. (#1…
bdice Mar 22, 2022
26c22b6
Column to JCUDF row for tables with strings (#10235)
hyperbolic2346 Mar 22, 2022
18398ab
Pin `dask` and `distributed` (#10481)
galipremsagar Mar 23, 2022
ce5bacb
Add 'spearman' correlation method for `dataframe.corr` (#7141)
dominicshanshan Mar 23, 2022
5129ee5
Batch of fixes for index overflows in grid stride loops. (#10448)
nvdbaranec Mar 23, 2022
7b9646b
Temporarily disable new `ops-bot` functionality (#10496)
ajschmidt8 Mar 23, 2022
9edcbd4
Fix documentation for DataFrame.corr and Series.corr. (#10493)
bdice Mar 23, 2022
12b66a3
Add StringIO support to read_text (#10465)
cwharris Mar 23, 2022
54918d8
Add `scipy` skip for a test (#10502)
galipremsagar Mar 24, 2022
a4c450b
Fix an issue with tdigest merge aggregations. (#10506)
nvdbaranec Mar 24, 2022
c71fe1b
Add Java bindings for t-digest reduction (#10446)
andygrove Mar 25, 2022
6c200bf
Updates to 10min notebook (#10531)
shwina Mar 29, 2022
daea0dd
Pin click version to last support by black<22.3.0.
vyasr Mar 29, 2022
e8dc00c
Fix formatting.
vyasr Mar 29, 2022
25e5e73
Update build.sh
galipremsagar Mar 30, 2022
d30b4f6
Update build.sh
galipremsagar Mar 30, 2022
c42cee3
Merge pull request #10535 from vyasr/pin_click
jjacobelli Mar 30, 2022
1cccc29
Pin CMake to prevent 3.23 bugs.
vyasr Mar 30, 2022
4770599
Adds launch bounds hints to mixed join kernels to address regression …
abellina Mar 30, 2022
a02b7c2
Merge pull request #10544 from vyasr/pin_cmake
jjacobelli Mar 31, 2022
2c81bed
JNI Bindings to fetch CUDA compute capability versions. (#10568)
mythrocks Apr 1, 2022
7a415f3
update copyrights (#10595)
galipremsagar Apr 5, 2022
4c84184
Update deprecated methods in 10min cupy notebook (#10594)
charlesbluca Apr 5, 2022
b50ae82
update changelog
raydouglass Apr 6, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 8 additions & 0 deletions .github/ops-bot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# This file controls which features from the `ops-bot` repository below are enabled.
# - https://github.com/rapidsai/ops-bot

auto_merger: true
branch_checker: true
label_checker: true
release_drafter: true
external_contributors: false
9 changes: 9 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ repos:
hooks:
- id: black
files: python/.*
additional_dependencies:
- click==8.0.4
- repo: https://github.com/PyCQA/flake8
rev: 3.8.3
hooks:
Expand Down Expand Up @@ -88,6 +90,13 @@ repos:
# of dependencies, so we'll have to update this manually.
additional_dependencies:
- cmake-format==0.6.11
- id: copyright-check
name: copyright-check
# This hook's use of Git tools appears to conflict with
# existing CI invocations so we don't invoke it during CI runs.
stages: [commit]
entry: python ./ci/checks/copyright.py --git-modified-only
language: python

default_language_version:
python: python3
452 changes: 228 additions & 224 deletions CHANGELOG.md

Large diffs are not rendered by default.

76 changes: 0 additions & 76 deletions Dockerfile

This file was deleted.

28 changes: 15 additions & 13 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ ARGS=$*
REPODIR=$(cd $(dirname $0); pwd)

VALIDARGS="clean libcudf cudf dask_cudf benchmarks tests libcudf_kafka cudf_kafka custreamz -v -g -n -l --allgpuarch --disable_nvtx --show_depr_warn --ptds -h --build_metrics --incl_cache_stats"
HELP="$0 [clean] [libcudf] [cudf] [dask_cudf] [benchmarks] [tests] [libcudf_kafka] [cudf_kafka] [custreamz] [-v] [-g] [-n] [-h] [-l] [--cmake-args=\\\"<args>\\\"]
HELP="$0 [clean] [libcudf] [cudf] [dask_cudf] [benchmarks] [tests] [libcudf_kafka] [cudf_kafka] [custreamz] [-v] [-g] [-n] [-h] [--cmake-args=\\\"<args>\\\"]
clean - remove all existing build artifacts and configuration (start
over)
libcudf - build the cudf C++ code only
Expand All @@ -32,7 +32,6 @@ HELP="$0 [clean] [libcudf] [cudf] [dask_cudf] [benchmarks] [tests] [libcudf_kafk
-v - verbose build mode
-g - build for debug
-n - no install step
-l - build legacy tests
--allgpuarch - build for all supported GPU architectures
--disable_nvtx - disable inserting NVTX profiling ranges
--show_depr_warn - show cmake deprecation warnings
Expand Down Expand Up @@ -169,6 +168,10 @@ if hasArg clean; then
rmdir ${bd} || true
fi
done

# Cleaning up python artifacts
find ${REPODIR}/python/ | grep -E "(__pycache__|\.pyc|\.pyo|\.so$)" | xargs rm -rf

fi


Expand All @@ -185,12 +188,9 @@ if buildAll || hasArg libcudf; then
fi

# get the current count before the compile starts
FILES_IN_CCACHE=""
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v ccache)" ]]; then
FILES_IN_CCACHE=$(ccache -s | grep "files in cache")
echo "$FILES_IN_CCACHE"
# zero the ccache statistics
ccache -z
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v sccache)" ]]; then
# zero the sccache statistics
sccache --zero-stats
fi

cmake -S $REPODIR/cpp -B ${LIB_BUILD_DIR} \
Expand All @@ -216,11 +216,12 @@ if buildAll || hasArg libcudf; then
echo "Formatting build metrics"
python ${REPODIR}/cpp/scripts/sort_ninja_log.py ${LIB_BUILD_DIR}/.ninja_log --fmt xml > ${LIB_BUILD_DIR}/ninja_log.xml
MSG="<p>"
# get some ccache stats after the compile
if [[ "$BUILD_REPORT_INCL_CACHE_STATS"=="ON" && -x "$(command -v ccache)" ]]; then
MSG="${MSG}<br/>$FILES_IN_CCACHE"
HIT_RATE=$(ccache -s | grep "cache hit rate")
MSG="${MSG}<br/>${HIT_RATE}"
# get some sccache stats after the compile
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v sccache)" ]]; then
COMPILE_REQUESTS=$(sccache -s | grep "Compile requests \+ [0-9]\+$" | awk '{ print $NF }')
CACHE_HITS=$(sccache -s | grep "Cache hits \+ [0-9]\+$" | awk '{ print $NF }')
HIT_RATE=$(echo - | awk "{printf \"%.2f\n\", $CACHE_HITS / $COMPILE_REQUESTS * 100}")
MSG="${MSG}<br/>cache hit rate ${HIT_RATE} %"
fi
MSG="${MSG}<br/>parallel setting: $PARALLEL_LEVEL"
MSG="${MSG}<br/>parallel build time: $compile_total seconds"
Expand All @@ -230,6 +231,7 @@ if buildAll || hasArg libcudf; then
fi
echo "$MSG"
python ${REPODIR}/cpp/scripts/sort_ninja_log.py ${LIB_BUILD_DIR}/.ninja_log --fmt html --msg "$MSG" > ${LIB_BUILD_DIR}/ninja_log.html
cp ${LIB_BUILD_DIR}/.ninja_log ${LIB_BUILD_DIR}/ninja.log
fi

if [[ ${INSTALL_TARGET} != "" ]]; then
Expand Down
27 changes: 14 additions & 13 deletions ci/benchmark/build.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2020, NVIDIA CORPORATION.
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
#########################################
# cuDF GPU build and test script for CI #
#########################################
Expand Down Expand Up @@ -36,8 +36,8 @@ export GBENCH_BENCHMARKS_DIR="$WORKSPACE/cpp/build/gbenchmarks/"
# like `/tmp` is.
export LIBCUDF_KERNEL_CACHE_PATH="$HOME/.jitify-cache"

# Dask & Distributed git tag
export DASK_DISTRIBUTED_GIT_TAG='2022.01.0'
# Dask & Distributed option to install main(nightly) or `conda-forge` packages.
export INSTALL_DASK_MAIN=0

function remove_libcudf_kernel_cache_dir {
EXITCODE=$?
Expand Down Expand Up @@ -77,11 +77,16 @@ conda install "rmm=$MINOR_VERSION.*" "cudatoolkit=$CUDA_REL" \
# conda remove -f rapids-build-env rapids-notebook-env
# conda install "your-pkg=1.0.0"

# Install the master version of dask, distributed, and streamz
logger "pip install git+https://github.com/dask/distributed.git@$DASK_DISTRIBUTED_GIT_TAG --upgrade --no-deps"
pip install "git+https://github.com/dask/distributed.git@$DASK_DISTRIBUTED_GIT_TAG" --upgrade --no-deps
logger "pip install git+https://github.com/dask/dask.git@$DASK_DISTRIBUTED_GIT_TAG --upgrade --no-deps"
pip install "git+https://github.com/dask/dask.git@$DASK_DISTRIBUTED_GIT_TAG" --upgrade --no-deps
# Install the conda-forge or nightly version of dask and distributed
if [[ "${INSTALL_DASK_MAIN}" == 1 ]]; then
gpuci_logger "gpuci_mamba_retry update dask"
gpuci_mamba_retry update dask
else
gpuci_logger "gpuci_mamba_retry install conda-forge::dask==2022.03.0 conda-forge::distributed==2022.03.0 conda-forge::dask-core==2022.03.0 --force-reinstall"
gpuci_mamba_retry install conda-forge::dask==2022.03.0 conda-forge::distributed==2022.03.0 conda-forge::dask-core==2022.03.0 --force-reinstall
fi

# Install the master version of streamz
logger "pip install git+https://github.com/python-streamz/streamz.git@master --upgrade --no-deps"
pip install "git+https://github.com/python-streamz/streamz.git@master" --upgrade --no-deps

Expand All @@ -98,11 +103,7 @@ conda list --show-channel-urls
################################################################################

logger "Build libcudf..."
if [[ "${BUILD_MODE}" == "pull-request" ]]; then
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf benchmarks tests --ptds
else
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf benchmarks tests -l --ptds
fi
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf benchmarks tests --ptds

################################################################################
# BENCHMARK - Run and parse libcudf and cuDF benchmarks
Expand Down
Loading