Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v22.02 #10101

Merged
merged 231 commits into from
Feb 2, 2022
Merged
Show file tree
Hide file tree
Changes from 230 commits
Commits
Show all changes
231 commits
Select commit Hold shift + click to select a range
ad02545
DOC v22.02 Updates
ajschmidt8 Nov 4, 2021
a497d73
Merge branch-21.12 into branch-22.02
robertmaynard Nov 11, 2021
1ee86e7
Merge pull request #9664 from robertmaynard/branch-22.02-merge-21.12
ajschmidt8 Nov 11, 2021
6a3ef7d
Merge pull request #9670 from rapidsai/branch-21.12
GPUtester Nov 12, 2021
9ec8b30
Merge pull request #9671 from rapidsai/branch-21.12
GPUtester Nov 12, 2021
8311512
Merge pull request #9673 from rapidsai/branch-21.12
GPUtester Nov 12, 2021
4a277ca
Merge pull request #9678 from rapidsai/branch-21.12
GPUtester Nov 12, 2021
d64e274
Fix links in C++ Developer Guide. (#9675)
bdice Nov 13, 2021
31f92d7
Merge pull request #9680 from rapidsai/branch-21.12
GPUtester Nov 13, 2021
753db88
Merge pull request #9692 from rapidsai/branch-21.12
GPUtester Nov 15, 2021
7fc65d8
Update cudf JNI to 22.02.0-SNAPSHOT (#9681)
pxLi Nov 16, 2021
7e4a985
Some improvements to `parse_decimal` function and bindings for `is_fi…
razajafri Nov 16, 2021
c667518
Merge pull request #9698 from rapidsai/branch-21.12
GPUtester Nov 16, 2021
c3bcc8d
Fix `null` handling when `boolean` dtype is passed (#9691)
galipremsagar Nov 16, 2021
b14d883
Merge pull request #9699 from rapidsai/branch-21.12
GPUtester Nov 16, 2021
4363a55
Merge pull request #9700 from rapidsai/branch-21.12
GPUtester Nov 16, 2021
55c9701
Merge pull request #9702 from rapidsai/branch-21.12
GPUtester Nov 16, 2021
60380de
Merge pull request #9708 from rapidsai/branch-21.12
GPUtester Nov 16, 2021
e08ae9c
Implement Series.datetime.floor (#9571)
skirui-source Nov 17, 2021
4d13d81
Fixed build by adding more checks for int8, int16 (#9707)
razajafri Nov 17, 2021
3b38aa7
Merge pull request #9714 from rapidsai/branch-21.12
GPUtester Nov 17, 2021
9114104
Add parameters to control row group size in Parquet writer (#9677)
vuule Nov 17, 2021
17e6f5b
Simplify merge internals and reduce overhead (#9516)
vyasr Nov 17, 2021
32bacfa
Interchange dataframe protocol (#9071)
iskode Nov 17, 2021
d4ff518
Simplify write_csv by removing unnecessary writer/impl classes (#9089)
cwharris Nov 18, 2021
406429a
ceil/floor for `DatetimeIndex` (#9554)
mayankanand007 Nov 18, 2021
91fd74e
Support `min` and `max` reduction for structs (#9697)
ttnghia Nov 18, 2021
fc82b1d
Spell check fixes (#9682)
karthikeyann Nov 18, 2021
c1bfb26
Fix regex non-multiline EOL/$ matching strings ending with a new-line…
davidwendt Nov 19, 2021
967a333
Merge branch-21.12 into branch-22.02
bdice Nov 19, 2021
0f1f1e7
Merge pull request #9730 from bdice/branch-22.02-merge-21.12
ajschmidt8 Nov 19, 2021
7b9ea2b
Merge pull request #9737 from rapidsai/branch-21.12
GPUtester Nov 19, 2021
05dd541
Use List of Columns as Input for `drop_nulls`, `gather` and `drop_dup…
isVoid Nov 19, 2021
68beed9
Merge pull request #9741 from rapidsai/branch-21.12
GPUtester Nov 19, 2021
09a8a47
Use stop instead of stop_. (#9735)
bdice Nov 19, 2021
f0367c0
Use cuFile direct device reads/writes by default in cuIO (#9722)
vuule Nov 19, 2021
65af9a3
Improve cmake format script (#9723)
vyasr Nov 20, 2021
43a13c6
Skip cufile tests in JNI build script (#9744)
pxLi Nov 22, 2021
7fa15db
Fix doxygen for enum types in libcudf (#9724)
davidwendt Nov 22, 2021
cac53c5
Enable string to decimal 128 cast (#9742)
razajafri Nov 22, 2021
ebeb202
Fix out-of-bounds memory write in decimal128-to-string conversion (#9…
davidwendt Nov 22, 2021
85df759
Merge pull request #9751 from rapidsai/branch-21.12
GPUtester Nov 22, 2021
d1811b5
update cuda version in local build (#9736)
karthikeyann Nov 23, 2021
0f7c532
Merge pull request #9785 from rapidsai/branch-21.12
GPUtester Nov 29, 2021
83adc4b
Merge pull request #9786 from rapidsai/branch-21.12
GPUtester Nov 29, 2021
4629037
Merge pull request #9798 from rapidsai/branch-21.12
GPUtester Nov 30, 2021
0fa0cc4
Support `min` and `max` in inclusive scan for structs (#9725)
ttnghia Nov 30, 2021
27b7190
Merge pull request #9799 from rapidsai/branch-21.12
GPUtester Nov 30, 2021
dca8a0a
Fix dtype-argument bug in dask_cudf read_csv (#9796)
rjzamora Nov 30, 2021
1db05c9
Use Java classloader to find test resources (#9760)
jlowe Nov 30, 2021
1697f63
Run compute-sanitizer in nightly build (#9641)
karthikeyann Nov 30, 2021
69d5765
Update check for inf/nan strings in libcudf float conversion to ignor…
davidwendt Nov 30, 2021
00a8845
Refactor TableTest assertion methods to a separate utility class (#9762)
jlowe Nov 30, 2021
554ac81
Load native dependencies when Java ColumnView is loaded (#9800)
jlowe Nov 30, 2021
20d6723
Copy Java native dependencies directly into classpath (#9787)
jlowe Nov 30, 2021
991136c
Add Pearson correlation for sort groupby (python) (#9166)
skirui-source Nov 30, 2021
1eabcb7
Fix some doxygen warnings and add missing documentation (#9770)
karthikeyann Dec 1, 2021
1ceb8ab
Improve build time of libcudf iterator tests (#9788)
davidwendt Dec 1, 2021
11c3dfe
Remove unused masked udf cython/c++ code (#9792)
brandon-b-miller Dec 1, 2021
1904d1a
Fix overflow for min calculation in strings::from_timestamps (#9793)
revans2 Dec 1, 2021
836f800
Use CTAD with Thrust function objects (#9768)
codereport Dec 1, 2021
677e632
Avoid overflow for `fixed_point` `cudf::cast` and performance optimiz…
codereport Dec 1, 2021
7d8a8e5
Allow cast decimal128 to string and add tests (#9756)
razajafri Dec 1, 2021
5491cc7
Fix memory error due to lambda return type deduction limitation (#9778)
karthikeyann Dec 1, 2021
c10966c
Fix make_empty_scalar_like on list_type (#9759)
sperlingxx Dec 2, 2021
582cc6e
Add sample JNI API (#9728)
res-life Dec 2, 2021
1077dae
Fix caching in `Series.applymap` (#9821)
brandon-b-miller Dec 2, 2021
50acf07
Fix stream usage in `segmented_gather()` (#9679)
mythrocks Dec 2, 2021
b848dd5
Fix ORC writer crash with empty input columns (#9808)
vuule Dec 2, 2021
0c08543
Update cmake and conda to 22.02 (#9746)
devavret Dec 2, 2021
ce64e53
Add directory-partitioned data support to cudf.read_parquet (#9720)
rjzamora Dec 3, 2021
e82cc62
Fix join of MultiIndex to Index with one column and overlapping name.…
vyasr Dec 3, 2021
50e22ab
Merge pull request #9836 from rapidsai/branch-21.12
GPUtester Dec 3, 2021
62103c6
Added a few more tests for Decimal to String cast (#9818)
razajafri Dec 3, 2021
7ac8aac
Merge pull request #9840 from rapidsai/branch-21.12
GPUtester Dec 3, 2021
fdd9bb0
Add JNI for `cudf::drop_duplicates` (#9841)
ttnghia Dec 3, 2021
8002cbd
Allow runtime has_nulls parameter for row operators (#9623)
davidwendt Dec 6, 2021
3b93f5c
Use stream allocator adaptor for hash join table (#9704)
PointKernel Dec 6, 2021
8ceed73
Use vector factories for host-device copies. (#9806)
bdice Dec 6, 2021
ccab7ae
Fix build instructions for libcudf doxygen (#9837)
davidwendt Dec 6, 2021
8c82d6a
adding `series.transpose` (#9835)
mayankanand007 Dec 7, 2021
0ce9571
Remove deprecated methods from Java Table class (#9853)
jlowe Dec 7, 2021
a5633c2
Adding support for `Series.autocorr` (#9833)
mayankanand007 Dec 7, 2021
ba3aedb
Raise temporary error for `decimal128` types in parquet reader (#9804)
galipremsagar Dec 7, 2021
a72f19e
Enforce boolean `ascending` for dask-cudf `sort_values` (#9814)
charlesbluca Dec 7, 2021
ea3aff2
Add decimal128 support to Parquet reader and writer (#9765)
vuule Dec 8, 2021
2e95fb1
Pick smallest decimal type with required precision in ORC reader (#9775)
vuule Dec 8, 2021
ffc6241
Update to UCX-Py 0.24 (#9748)
pentschev Dec 8, 2021
4579d23
Add_suffix and add_prefix for DataFrames and Series (#9846)
mayankanand007 Dec 8, 2021
e6b0661
add templated benchmark with fixture (#9838)
karthikeyann Dec 8, 2021
024003c
Fix missing streams (#9767)
karthikeyann Dec 9, 2021
8e5b23c
Load libcufile.so with RTLD_NODELETE flag (#9872)
vuule Dec 9, 2021
05c209f
Merge pull request #9881 from rapidsai/branch-21.12
GPUtester Dec 9, 2021
c26779c
Fix an out-of-bounds read in validity copying in contiguous_split. (#…
nvdbaranec Dec 9, 2021
d7ce106
Support statically linking CUDA runtime for Java bindings (#9873)
jlowe Dec 9, 2021
9435945
Add one-level list encoding support in parquet reader (#9848)
PointKernel Dec 10, 2021
b545df4
add run_benchmarks target for running benchmarks with json output (#9…
karthikeyann Dec 10, 2021
45001f6
Support round operation on datetime64 datatypes (#9820)
mayankanand007 Dec 10, 2021
c012de5
Revert regex $/EOL end-of-string new-line special case handling (#9774)
davidwendt Dec 10, 2021
b359a0f
Refactor bit counting APIs, introduce valid/null count functions, and…
bdice Dec 10, 2021
e581734
Add zlib to cudfjni link when using static libcudf library dependency…
jlowe Dec 11, 2021
d23bcb4
Remove `IncludeCategories` from `.clang-format` (#9876)
codereport Dec 12, 2021
335862b
Fix fallback to sort aggregation for grouping only hash aggregate (#9…
abellina Dec 13, 2021
b3b299a
Break tie for `top` categorical columns in `Series.describe` (#9867)
isVoid Dec 14, 2021
2627153
Change default `dtype` of all nulls column from `float` to `object` (…
galipremsagar Dec 14, 2021
7a23f1a
Add utility to format ninja-log build times (#9631)
davidwendt Dec 14, 2021
61794aa
Fix a memcheck error in ORC writer (#9896)
vuule Dec 14, 2021
41f9956
Add partitioning support in parquet writer (#9810)
devavret Dec 14, 2021
fc2a32a
Introduce `nan_as_null` parameter for `cudf.Index` (#9893)
galipremsagar Dec 14, 2021
44fce8b
Fix cudf.Scalar string datetime construction (#9875)
brandon-b-miller Dec 15, 2021
3428f7f
Fix compilation of benchmark for parquet writer. (#9905)
bdice Dec 15, 2021
78d12bb
Update ucx-py version on release using rvc (#9897)
jjacobelli Dec 15, 2021
38631a6
Fix the java build after parquet partitioning support (#9908)
revans2 Dec 15, 2021
db9aef8
Add regex_flags parameter to strings replace_re functions (#9878)
davidwendt Dec 15, 2021
0c3f735
Add dictionary support to cudf::copy_if_else (#9887)
davidwendt Dec 15, 2021
967f339
Remove conda envs for CUDA 11.0 and 11.2. (#9910)
bdice Dec 15, 2021
0faf2af
Implement JNI for `cudf::scatter` APIs (#9903)
ttnghia Dec 15, 2021
56430b4
Use pandas `to_offset` to parse frequency string in `date_range` (#9843)
isVoid Dec 16, 2021
52d7acc
Replace `thrust/std::get` with structure bindings (#9915)
codereport Dec 16, 2021
b08b37d
Add missing imports tests (#9920)
jjacobelli Dec 16, 2021
b8f812a
Fix null handling for structs `min` and `arg_min` in groupby, groupby…
ttnghia Dec 16, 2021
3c0d508
Merge pull request #9923 from rapidsai/branch-21.12
GPUtester Dec 16, 2021
19190b4
JNI: Function to copy and set validity from bool column. (#9901)
mythrocks Dec 16, 2021
428a1b3
Add test for map column metadata handling in ORC writer (#9852)
vuule Dec 17, 2021
e6c6991
Use dynamic nullate for join hasher and equality comparator (#9902)
davidwendt Dec 17, 2021
8c5a85a
Fix see also links for IO APIs (#9895)
galipremsagar Dec 17, 2021
23cafcf
TimedeltaIndex constructor raises an AttributeError. (#9884)
skirui-source Dec 17, 2021
84073e8
update changelog
ajschmidt8 Dec 17, 2021
ce02856
Add decimal types to cuIO benchmarks (#9776)
vuule Dec 17, 2021
a4dc42d
Implement `lists::index_of()` to find positions in list rows (#9510)
mythrocks Dec 20, 2021
acd36ee
Merge branch 'branch-21.12' into branch-22.02-merge-21.12
bdice Dec 22, 2021
68384ea
Merge branch-21.12 into branch-22.02
bdice Dec 22, 2021
1e74c2a
Merge pull request #9947 from bdice/branch-22.02-merge-21.12
raydouglass Dec 22, 2021
04f4219
Use gpuci_mamba_retry to install local artifacts. (#9951)
bdice Dec 23, 2021
c99a37f
Remove deprecated method Series.hash_encode. (#9942)
bdice Dec 23, 2021
e432d01
Add `first` and `last` method to `IndexedFrame` (#9710)
isVoid Dec 24, 2021
bf7f7be
Fix cudf compilation instructions. (#9956)
esoha-nvidia Dec 24, 2021
67c925c
Fix cudf java build error. (#9958)
firestarman Dec 29, 2021
7233765
Remove various unused functions (#9922)
vyasr Jan 3, 2022
897a9ea
Refactoring ceil/round/floor code for datetime64 types (#9926)
mayankanand007 Jan 4, 2022
d69ea61
Remove deprecated method DataFrame.hash_columns. (#9943)
bdice Jan 4, 2022
cc4a2bd
Upgrade thrust version to 1.15 (#9912)
robertmaynard Jan 4, 2022
36fa5f3
Implement per-list sequence (#9839)
ttnghia Jan 4, 2022
b1ae789
Enable transpose for string columns in cudf python (#9937)
galipremsagar Jan 4, 2022
f7cc6a0
Rename aggregate_metadata in writer to fix name collision (#9938)
devavret Jan 5, 2022
6a6fbb3
Add jni for sequences (#9972)
wbo4958 Jan 5, 2022
3e893a6
Remove str.subword_tokenize (#9968)
VibhuJawa Jan 5, 2022
2112757
Replace cudf's concurrent_ordered_map with cuco::static_map in semi/a…
vyasr Jan 5, 2022
eba4f03
Add cudf::strings::extract_all API (#9909)
davidwendt Jan 5, 2022
33f7f0d
Fix regression HostColumnVectorCore requiring native libs (#9948)
jlowe Jan 5, 2022
b1de945
Remove deprecated method `one_hot_encoding` (#9977)
isVoid Jan 6, 2022
a61fc55
Minor cleanup of unused Python functions (#9974)
vyasr Jan 6, 2022
23603d1
custreamz oauth callback for kafka (librdkafka) (#9486)
jdye64 Jan 6, 2022
61199ea
Fix groupby shift/diff/fill after selecting from a `GroupBy` (#9984)
shwina Jan 6, 2022
7392f9f
use ninja in java ci build (#9933)
rongou Jan 6, 2022
120aa62
Fixed issue with percentile_approx where output tdigests could have u…
nvdbaranec Jan 6, 2022
de8c0b8
Resolve racecheck errors in ORC kernels (#9916)
vuule Jan 7, 2022
42a0e55
Clean up CUDA stream use in cuIO (#9991)
vuule Jan 7, 2022
7656277
Use default value for decimal precision in parquet writer when not sp…
devavret Jan 7, 2022
84c8cde
Use addressed-ordered first fit for the pinned memory pool (#9989)
rongou Jan 7, 2022
3192ace
Java bindings for JSON reader support (#9940)
wbo4958 Jan 7, 2022
0722e20
Consolidate and improve `reset_index` (#9750)
isVoid Jan 8, 2022
3b4f903
Fix the overflow problem of decimal rescale (#9966)
sperlingxx Jan 10, 2022
8ba8774
Avoid overflow for fixed_point round (#9809)
sperlingxx Jan 10, 2022
bb3844e
Use new efficient partitioned parquet writing in cuDF (#9971)
devavret Jan 10, 2022
b7b87fb
Add build-time publish step to cpu build script (#9927)
davidwendt Jan 10, 2022
dd390a2
Rewriting row/column conversions for Spark <-> cudf data conversions …
hyperbolic2346 Jan 10, 2022
0d5ec7f
Use gpuci_mamba_retry on Java CI. (#9983)
bdice Jan 10, 2022
e8c4e60
Add missing list filling header in meta.yaml (#10007)
devavret Jan 10, 2022
cee55fd
Optimize `groupby::scan` (#9754)
PointKernel Jan 10, 2022
496aa47
Refactor host device macros (#9797)
vyasr Jan 10, 2022
d3282cb
Fix null check when comparing rows of structs in `min` and `max` redu…
ttnghia Jan 11, 2022
951f630
Fix `conda` recipes for `custreamz` & `cudf_kafka` (#10003)
ajschmidt8 Jan 11, 2022
07fa888
Use `cuda::std::is_arithmetic` in `cudf::is_numeric` trait. (#9996)
bdice Jan 11, 2022
7ec4271
Remove `CUDA_DEVICE_CALLABLE` macro usage (#10015)
hyperbolic2346 Jan 11, 2022
cc25f3d
Match pandas scalar result types in reductions (#9717)
brandon-b-miller Jan 11, 2022
88e6a29
Wrap CI script shell variables in quotes to fix local testing. (#10018)
bdice Jan 11, 2022
25a7485
Fix regex doc describing hexadecimal escape characters (#10009)
davidwendt Jan 11, 2022
3216342
Raise in `query` if dtype is not supported (#9921)
brandon-b-miller Jan 11, 2022
813ac97
Use list of column inputs for `apply_boolean_mask` (#9832)
isVoid Jan 11, 2022
a43682e
cudftestutil no longer propagates compiler flags to external users (#…
robertmaynard Jan 12, 2022
093b0ad
Add strings tests to transpose_test.cpp (#9985)
davidwendt Jan 12, 2022
76f89db
Update JNI to use new arena mr constructor (#10027)
rongou Jan 12, 2022
b8c4816
Unpin `dask` and `distributed` in CI (#10028)
galipremsagar Jan 12, 2022
3176258
Return null count from inplace_bitmask_and. (#9904)
bdice Jan 12, 2022
4950a7a
Remove deprecated `method` parameter from `merge` and `join`. (#9944)
bdice Jan 13, 2022
fe71bab
Fix memory leaks in JNI native code. (#10029)
mythrocks Jan 13, 2022
d0c85e1
build.sh respects the `--build_metrics` and `--incl_cache_stats` flag…
robertmaynard Jan 13, 2022
dbe65f1
Fix null check when comparing structs in `arg_min` operation of reduc…
ttnghia Jan 13, 2022
c07fdab
Load balance optimization for contiguous_split (#9755)
nvdbaranec Jan 13, 2022
1eceaed
Add partitioning support to Parquet chunked writer (#10000)
devavret Jan 14, 2022
ca77542
Allow custom sort functions for dask-cudf `sort_values` (#9789)
charlesbluca Jan 14, 2022
ce31d7d
Fix octal pattern matching in regex string (#9993)
davidwendt Jan 14, 2022
b01c846
Allow CuPy 10 (#10048)
jakirkham Jan 14, 2022
12adb8a
Fix repr and concat of `StructColumn` (#10042)
galipremsagar Jan 14, 2022
8c8d6ef
Fix dataframe setitem with `ndarray` types (#10056)
galipremsagar Jan 14, 2022
e24fa8f
Run doctests. (#9815)
bdice Jan 15, 2022
7ff5f12
Support structs for `cudf::contains` with column/scalar input (#9929)
ttnghia Jan 15, 2022
e4a16ae
Implement mixed equality/conditional joins (#9917)
vyasr Jan 18, 2022
5ea3df6
Remove python constraints in cutreamz and cudf_kafka recipes (#10052)
jjacobelli Jan 18, 2022
45c20d1
`decimal128` Support for `to/from_arrow` (#9986)
codereport Jan 18, 2022
04b79ac
Remove implicit copy due to conversion from cudf::size_type and size_…
robertmaynard Jan 18, 2022
8d7330f
Add support for `decimal128` in cudf python (#9533)
galipremsagar Jan 18, 2022
4e4c3dd
Update `decimal` dtypes related docs entries (#10072)
galipremsagar Jan 18, 2022
512e161
Add check for negative stripe index in ORC reader (#10074)
vuule Jan 19, 2022
b90a6fd
fix gcc 11 compilation errors (#10067)
rongou Jan 19, 2022
e416188
Avoid index materialization when `DataFrame` is created with un-named…
galipremsagar Jan 19, 2022
3aecce2
Update Java tests to expect DECIMAL128 from Arrow (#10073)
jlowe Jan 19, 2022
8e88adc
Fix `columns` ordering issue in parquet reader (#10066)
galipremsagar Jan 19, 2022
f193d59
Include row group level stats when writing ORC files (#10041)
vuule Jan 19, 2022
e49084e
Java bindings for mixed left, inner, and full joins (#9941)
jlowe Jan 19, 2022
8fd7dd2
Move `drop_duplicates`, `drop_na`, `_gather`, `take` to IndexFrame an…
isVoid Jan 19, 2022
f041034
Replace custom CUDA bindings previously provided by RMM with official…
shwina Jan 20, 2022
6bbe2e8
Include <optional> in headers that use std::optional (#10044)
robertmaynard Jan 20, 2022
ab752d4
Simplify custreamz and cudf_kafka recipes files (#10065)
jjacobelli Jan 20, 2022
c00f42b
Spark Decimal128 hashing (#9919)
rwlee Jan 20, 2022
d5f1aed
Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops (#10016)
revans2 Jan 20, 2022
690993c
Add `struct` generation support in datagenerator & fuzz tests (#9180)
galipremsagar Jan 20, 2022
2bd7320
Add `_from_column_like_self` factory (#10022)
isVoid Jan 20, 2022
e78f47a
Add `groupby.transform` (only support for aggregations) (#10005)
shwina Jan 20, 2022
13429ff
Fix matching regex word-boundary (\b) in strings replace (#9997)
davidwendt Jan 20, 2022
276bcf4
Add `clang-tidy` to libcudf (#9860)
codereport Jan 20, 2022
12a0f59
Remove libcudacxx patch needed for nvcc 11.4 (#10057)
robertmaynard Jan 20, 2022
09035d6
Use fsspec.parquet for improved read_parquet performance from remote …
rjzamora Jan 20, 2022
1b93126
Prepare upload scripts for Python 3.7 removal (#10092)
jjacobelli Jan 20, 2022
53a31d1
ORC writer API changes for granular statistics (#10058)
mythrocks Jan 20, 2022
5a4c5f3
Fix for appending decimal128 under list and struct types (#10105)
revans2 Jan 21, 2022
893f540
pin dask release version (#10108)
galipremsagar Jan 24, 2022
270772f
Correctly construct `data_column` variable when `drop_nan == False` i…
isVoid Jan 26, 2022
cfcb3ac
Always upload cudf packages (#10147)
raydouglass Jan 27, 2022
a7d88cd
update changelog
raydouglass Feb 2, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
27 changes: 27 additions & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
Checks:
'modernize-*,
-modernize-use-equals-default,
-modernize-concat-nested-namespaces,
-modernize-use-trailing-return-type'

# -modernize-use-equals-default # auto-fix is broken (doesn't insert =default correctly)
# -modernize-concat-nested-namespaces # auto-fix is broken (can delete code)
# -modernize-use-trailing-return-type # just a preference

WarningsAsErrors: ''
HeaderFilterRegex: ''
AnalyzeTemporaryDtors: false
FormatStyle: none
CheckOptions:
- key: modernize-loop-convert.MaxCopySize
value: '16'
- key: modernize-loop-convert.MinConfidence
value: reasonable
- key: modernize-pass-by-value.IncludeStyle
value: llvm
- key: modernize-replace-auto-ptr.IncludeStyle
value: llvm
- key: modernize-use-nullptr.NullMacros
value: 'NULL'
...
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ repos:
args: ['-fallback-style=none']
- id: cmake-format
name: cmake-format
entry: bash cpp/scripts/run-cmake-format.sh cmake-format
entry: ./cpp/scripts/run-cmake-format.sh cmake-format
language: python
types: [cmake]
# Note that pre-commit autoupdate does not update the versions
Expand All @@ -81,7 +81,7 @@ repos:
- cmake-format==0.6.11
- id: cmake-lint
name: cmake-lint
entry: bash cpp/scripts/run-cmake-format.sh cmake-lint
entry: ./cpp/scripts/run-cmake-format.sh cmake-lint
language: python
types: [cmake]
# Note that pre-commit autoupdate does not update the versions
Expand Down
233 changes: 230 additions & 3 deletions CHANGELOG.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ git submodule update --init --remote --recursive
```bash
# create the conda environment (assuming in base `cudf` directory)
# note: RAPIDS currently doesn't support `channel_priority: strict`; use `channel_priority: flexible` instead
conda env create --name cudf_dev --file conda/environments/cudf_dev_cuda11.0.yml
conda env create --name cudf_dev --file conda/environments/cudf_dev_cuda11.5.yml
# activate the environment
conda activate cudf_dev
```
Expand Down
49 changes: 47 additions & 2 deletions build.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright (c) 2019-2021, NVIDIA CORPORATION.
# Copyright (c) 2019-2022, NVIDIA CORPORATION.

# cuDF build script

Expand All @@ -17,7 +17,7 @@ ARGS=$*
# script, and that this script resides in the repo dir!
REPODIR=$(cd $(dirname $0); pwd)

VALIDARGS="clean libcudf cudf dask_cudf benchmarks tests libcudf_kafka cudf_kafka custreamz -v -g -n -l --allgpuarch --disable_nvtx --show_depr_warn --ptds -h"
VALIDARGS="clean libcudf cudf dask_cudf benchmarks tests libcudf_kafka cudf_kafka custreamz -v -g -n -l --allgpuarch --disable_nvtx --show_depr_warn --ptds -h --build_metrics --incl_cache_stats"
HELP="$0 [clean] [libcudf] [cudf] [dask_cudf] [benchmarks] [tests] [libcudf_kafka] [cudf_kafka] [custreamz] [-v] [-g] [-n] [-h] [-l] [--cmake-args=\\\"<args>\\\"]
clean - remove all existing build artifacts and configuration (start
over)
Expand All @@ -37,6 +37,8 @@ HELP="$0 [clean] [libcudf] [cudf] [dask_cudf] [benchmarks] [tests] [libcudf_kafk
--disable_nvtx - disable inserting NVTX profiling ranges
--show_depr_warn - show cmake deprecation warnings
--ptds - enable per-thread default stream
--build_metrics - generate build metrics report for libcudf
--incl_cache_stats - include cache statistics in build metrics report
--cmake-args=\\\"<args>\\\" - pass arbitrary list of CMake configuration options (escape all quotes in argument)
-h | --h[elp] - print this text

Expand All @@ -61,6 +63,8 @@ BUILD_NVTX=ON
BUILD_TESTS=OFF
BUILD_DISABLE_DEPRECATION_WARNING=ON
BUILD_PER_THREAD_DEFAULT_STREAM=OFF
BUILD_REPORT_METRICS=OFF
BUILD_REPORT_INCL_CACHE_STATS=OFF

# Set defaults for vars that may not have been defined externally
# FIXME: if INSTALL_PREFIX is not set, check PREFIX, then check
Expand Down Expand Up @@ -144,6 +148,14 @@ fi
if hasArg --ptds; then
BUILD_PER_THREAD_DEFAULT_STREAM=ON
fi
if hasArg --build_metrics; then
BUILD_REPORT_METRICS=ON
fi

if hasArg --incl_cache_stats; then
BUILD_REPORT_INCL_CACHE_STATS=ON
fi


# If clean given, run it prior to any other steps
if hasArg clean; then
Expand Down Expand Up @@ -172,6 +184,15 @@ if buildAll || hasArg libcudf; then
echo "Building for *ALL* supported GPU architectures..."
fi

# get the current count before the compile starts
FILES_IN_CCACHE=""
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v ccache)" ]]; then
FILES_IN_CCACHE=$(ccache -s | grep "files in cache")
echo "$FILES_IN_CCACHE"
# zero the ccache statistics
ccache -z
fi

cmake -S $REPODIR/cpp -B ${LIB_BUILD_DIR} \
-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX} \
${CUDF_CMAKE_CUDA_ARCHITECTURES} \
Expand All @@ -185,7 +206,31 @@ if buildAll || hasArg libcudf; then

cd ${LIB_BUILD_DIR}

compile_start=$(date +%s)
cmake --build . -j${PARALLEL_LEVEL} ${VERBOSE_FLAG}
compile_end=$(date +%s)
compile_total=$(( compile_end - compile_start ))

# Record build times
if [[ "$BUILD_REPORT_METRICS" == "ON" && -f "${LIB_BUILD_DIR}/.ninja_log" ]]; then
echo "Formatting build metrics"
python ${REPODIR}/cpp/scripts/sort_ninja_log.py ${LIB_BUILD_DIR}/.ninja_log --fmt xml > ${LIB_BUILD_DIR}/ninja_log.xml
MSG="<p>"
# get some ccache stats after the compile
if [[ "$BUILD_REPORT_INCL_CACHE_STATS"=="ON" && -x "$(command -v ccache)" ]]; then
MSG="${MSG}<br/>$FILES_IN_CCACHE"
HIT_RATE=$(ccache -s | grep "cache hit rate")
MSG="${MSG}<br/>${HIT_RATE}"
fi
MSG="${MSG}<br/>parallel setting: $PARALLEL_LEVEL"
MSG="${MSG}<br/>parallel build time: $compile_total seconds"
if [[ -f "${LIB_BUILD_DIR}/libcudf.so" ]]; then
LIBCUDF_FS=$(ls -lh ${LIB_BUILD_DIR}/libcudf.so | awk '{print $5}')
MSG="${MSG}<br/>libcudf.so size: $LIBCUDF_FS"
fi
echo "$MSG"
python ${REPODIR}/cpp/scripts/sort_ninja_log.py ${LIB_BUILD_DIR}/.ninja_log --fmt html --msg "$MSG" > ${LIB_BUILD_DIR}/ninja_log.html
fi

if [[ ${INSTALL_TARGET} != "" ]]; then
cmake --build . -j${PARALLEL_LEVEL} --target install ${VERBOSE_FLAG}
Expand Down
4 changes: 2 additions & 2 deletions ci/benchmark/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ export GBENCH_BENCHMARKS_DIR="$WORKSPACE/cpp/build/gbenchmarks/"
export LIBCUDF_KERNEL_CACHE_PATH="$HOME/.jitify-cache"

# Dask & Distributed git tag
export DASK_DISTRIBUTED_GIT_TAG='2021.11.2'
export DASK_DISTRIBUTED_GIT_TAG='2022.01.0'

function remove_libcudf_kernel_cache_dir {
EXITCODE=$?
Expand Down Expand Up @@ -98,7 +98,7 @@ conda list --show-channel-urls
################################################################################

logger "Build libcudf..."
if [[ ${BUILD_MODE} == "pull-request" ]]; then
if [[ "${BUILD_MODE}" == "pull-request" ]]; then
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf benchmarks tests --ptds
else
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf benchmarks tests -l --ptds
Expand Down
8 changes: 8 additions & 0 deletions ci/cpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,14 @@ if [ "$BUILD_LIBCUDF" == '1' ]; then
mkdir -p ${CONDA_BLD_DIR}/libcudf/work
cp -r ${CONDA_BLD_DIR}/work/* ${CONDA_BLD_DIR}/libcudf/work

# Copy libcudf build metrics results
LIBCUDF_BUILD_DIR=$CONDA_BLD_DIR/libcudf/work/cpp/build
echo "Checking for build metrics log $LIBCUDF_BUILD_DIR/ninja_log.html"
if [[ -f "$LIBCUDF_BUILD_DIR/ninja_log.html" ]]; then
gpuci_logger "Copying build metrics results"
mkdir -p "$WORKSPACE/build-metrics"
cp "$LIBCUDF_BUILD_DIR/ninja_log.html" "$WORKSPACE/build-metrics/BuildMetrics.html"
fi

gpuci_logger "Build conda pkg for libcudf_kafka"
gpuci_conda_retry build --no-build-id --croot ${CONDA_BLD_DIR} conda/recipes/libcudf_kafka $CONDA_BUILD_ARGS
Expand Down
28 changes: 4 additions & 24 deletions ci/cpu/prebuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,11 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
set -e

DEFAULT_CUDA_VER="11.5"

#Always upload cudf Python package
#Always upload cudf packages
export UPLOAD_CUDF=1

#Upload libcudf once per CUDA
if [[ "$PYTHON" == "3.7" ]]; then
export UPLOAD_LIBCUDF=1
else
export UPLOAD_LIBCUDF=0
fi

# upload cudf_kafka for all versions of Python
if [[ "$CUDA" == "${DEFAULT_CUDA_VER}" ]]; then
export UPLOAD_CUDF_KAFKA=1
else
export UPLOAD_CUDF_KAFKA=0
fi

#We only want to upload libcudf_kafka once per python/CUDA combo
if [[ "$PYTHON" == "3.7" ]] && [[ "$CUDA" == "${DEFAULT_CUDA_VER}" ]]; then
export UPLOAD_LIBCUDF_KAFKA=1
else
export UPLOAD_LIBCUDF_KAFKA=0
fi
export UPLOAD_LIBCUDF=1
export UPLOAD_CUDF_KAFKA=1
export UPLOAD_LIBCUDF_KAFKA=1

if [[ -z "$PROJECT_FLASH" || "$PROJECT_FLASH" == "0" ]]; then
#If project flash is not activate, always build both
Expand Down
2 changes: 1 addition & 1 deletion ci/cpu/upload.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ export GPUCI_RETRY_SLEEP=30
export LABEL_OPTION=${LABEL_OPTION:-"--label main"}

# Skip uploads unless BUILD_MODE == "branch"
if [ ${BUILD_MODE} != "branch" ]; then
if [ "${BUILD_MODE}" != "branch" ]; then
echo "Skipping upload"
return 0
fi
Expand Down
49 changes: 41 additions & 8 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2018-2020, NVIDIA CORPORATION.
# Copyright (c) 2018-2021, NVIDIA CORPORATION.
##############################################
# cuDF GPU build and test script for CI #
##############################################
Expand Down Expand Up @@ -31,7 +31,10 @@ export GIT_DESCRIBE_TAG=`git describe --tags`
export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`

# Dask & Distributed git tag
export DASK_DISTRIBUTED_GIT_TAG='2021.11.2'
export DASK_DISTRIBUTED_GIT_TAG='2022.01.0'

# ucx-py version
export UCX_PY_VERSION='0.24.*'

################################################################################
# TRAP - Setup trap for removing jitify cache
Expand Down Expand Up @@ -83,10 +86,10 @@ gpuci_mamba_retry install -y \
"rapids-notebook-env=$MINOR_VERSION.*" \
"dask-cuda=${MINOR_VERSION}" \
"rmm=$MINOR_VERSION.*" \
"ucx-py=0.23.*"
"ucx-py=${UCX_PY_VERSION}"

# https://docs.rapids.ai/maintainers/depmgmt/
# gpuci_mamba_retry remove --force rapids-build-env rapids-notebook-env
# gpuci_conda_retry remove --force rapids-build-env rapids-notebook-env
# gpuci_mamba_retry install -y "your-pkg=1.0.0"


Expand Down Expand Up @@ -121,7 +124,7 @@ if [[ -z "$PROJECT_FLASH" || "$PROJECT_FLASH" == "0" ]]; then
################################################################################

gpuci_logger "Build from source"
if [[ ${BUILD_MODE} == "pull-request" ]]; then
if [[ "${BUILD_MODE}" == "pull-request" ]]; then
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf libcudf_kafka cudf_kafka benchmarks tests --ptds
else
"$WORKSPACE/build.sh" clean libcudf cudf dask_cudf libcudf_kafka cudf_kafka benchmarks tests -l --ptds
Expand Down Expand Up @@ -166,16 +169,46 @@ else
gpuci_logger "Check GPU usage"
nvidia-smi

gpuci_logger "GoogleTests"
set -x
cd $LIB_BUILD_DIR

gpuci_logger "GoogleTests"

for gt in gtests/* ; do
test_name=$(basename ${gt})
echo "Running GoogleTest $test_name"
${gt} --gtest_output=xml:"$WORKSPACE/test-results/"
done

# Copy libcudf build time results
echo "Checking for build time log $LIB_BUILD_DIR/ninja_log.xml"
if [[ -f "$LIB_BUILD_DIR/ninja_log.xml" ]]; then
gpuci_logger "Copying build time results"
cp "$LIB_BUILD_DIR/ninja_log.xml" "$WORKSPACE/test-results/buildtimes-junit.xml"
fi

################################################################################
# MEMCHECK - Run compute-sanitizer on GoogleTest (only in nightly builds)
################################################################################
if [[ "$BUILD_MODE" == "branch" && "$BUILD_TYPE" == "gpu" ]]; then
if [[ "$COMPUTE_SANITIZER_ENABLE" == "true" ]]; then
gpuci_logger "Memcheck on GoogleTests with rmm_mode=cuda"
export GTEST_CUDF_RMM_MODE=cuda
COMPUTE_SANITIZER_CMD="compute-sanitizer --tool memcheck"
mkdir -p "$WORKSPACE/test-results/"
for gt in gtests/*; do
test_name=$(basename ${gt})
if [[ "$test_name" == "ERROR_TEST" ]]; then
continue
fi
echo "Running GoogleTest $test_name"
${COMPUTE_SANITIZER_CMD} ${gt} | tee "$WORKSPACE/test-results/${test_name}.cs.log"
done
unset GTEST_CUDF_RMM_MODE
# test-results/*.cs.log are processed in gpuci
fi
fi

CUDF_CONDA_FILE=`find ${CONDA_ARTIFACT_PATH} -name "libcudf-*.tar.bz2"`
CUDF_CONDA_FILE=`basename "$CUDF_CONDA_FILE" .tar.bz2` #get filename without extension
CUDF_CONDA_FILE=${CUDF_CONDA_FILE//-/=} #convert to conda install
Expand All @@ -184,12 +217,12 @@ else
KAFKA_CONDA_FILE=${KAFKA_CONDA_FILE//-/=} #convert to conda install

gpuci_logger "Installing $CUDF_CONDA_FILE & $KAFKA_CONDA_FILE"
conda install -c ${CONDA_ARTIFACT_PATH} "$CUDF_CONDA_FILE" "$KAFKA_CONDA_FILE"
gpuci_mamba_retry install -c ${CONDA_ARTIFACT_PATH} "$CUDF_CONDA_FILE" "$KAFKA_CONDA_FILE"

install_dask

gpuci_logger "Build python libs from source"
if [[ ${BUILD_MODE} == "pull-request" ]]; then
if [[ "${BUILD_MODE}" == "pull-request" ]]; then
"$WORKSPACE/build.sh" cudf dask_cudf cudf_kafka --ptds
else
"$WORKSPACE/build.sh" cudf dask_cudf cudf_kafka -l --ptds
Expand Down
15 changes: 11 additions & 4 deletions ci/gpu/java.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ export CONDA_ARTIFACT_PATH="$WORKSPACE/ci/artifacts/cudf/cpu/.conda-bld/"
export GIT_DESCRIBE_TAG=`git describe --tags`
export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`

# ucx-py version
export UCX_PY_VERSION='0.24.*'

################################################################################
# TRAP - Setup trap for removing jitify cache
################################################################################
Expand Down Expand Up @@ -74,19 +77,23 @@ conda config --show-sources
conda list --show-channel-urls

gpuci_logger "Install dependencies"
gpuci_conda_retry install -y \
gpuci_mamba_retry install -y \
"cudatoolkit=$CUDA_REL" \
"rapids-build-env=$MINOR_VERSION.*" \
"rapids-notebook-env=$MINOR_VERSION.*" \
"dask-cuda=${MINOR_VERSION}" \
"rmm=$MINOR_VERSION.*" \
"ucx-py=0.23.*" \
"ucx-py=${UCX_PY_VERSION}" \
"openjdk=8.*" \
"maven"
# "mamba install openjdk" adds an activation script to set JAVA_HOME but this is
# not triggered on installation. Re-activating the conda environment will set
# this environment variable so that CMake can find JNI.
conda activate rapids

# https://docs.rapids.ai/maintainers/depmgmt/
# gpuci_conda_retry remove --force rapids-build-env rapids-notebook-env
# gpuci_conda_retry install -y "your-pkg=1.0.0"
# gpuci_mamba_retry install -y "your-pkg=1.0.0"


gpuci_logger "Check compiler versions"
Expand Down Expand Up @@ -127,7 +134,7 @@ KAFKA_CONDA_FILE=`basename "$KAFKA_CONDA_FILE" .tar.bz2` #get filename without e
KAFKA_CONDA_FILE=${KAFKA_CONDA_FILE//-/=} #convert to conda install

gpuci_logger "Installing $CUDF_CONDA_FILE & $KAFKA_CONDA_FILE"
conda install -c ${CONDA_ARTIFACT_PATH} "$CUDF_CONDA_FILE" "$KAFKA_CONDA_FILE"
gpuci_mamba_retry install -c ${CONDA_ARTIFACT_PATH} "$CUDF_CONDA_FILE" "$KAFKA_CONDA_FILE"

install_dask

Expand Down
Loading