Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cudf v21.06 #8418

Merged
merged 327 commits into from
Jun 9, 2021
Merged

[RELEASE] cudf v21.06 #8418

merged 327 commits into from
Jun 9, 2021

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-21.06 and v21.06 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-21.06 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-21.06 into main for the release

dillon-cullinan and others added 30 commits April 16, 2021 14:37
Replaces instances of `rmm::device_vector` with `rmm::device_uvector` in gather detail functions and in gather tests.

Also adds a utility factory to create a device_uvector containing all zeros, `cudf::detail::make_zero_device_uvector_async()` (also sync() version).

Contributes to #7287

This speeds up small gathers, especially gathers that result in a lot of random accesses in multiple columns (`coalesce_o` benchmarks below).

```
(rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/gather_vector.json ./gather_uvector.json 
Comparing /home/mharris/rapids/cudf/cpp/build/gather_vector.json to ./gather_uvector.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
Gather/double_coalesce_x/1024/1/manual_time                    -0.2107         -0.1394         39220         30955         59030         50802
Gather/double_coalesce_x/2048/1/manual_time                    -0.2020         -0.1347         39390         31434         59189         51214
Gather/double_coalesce_x/4096/1/manual_time                    -0.3780         -0.2978         50106         31168         72432         50863
Gather/double_coalesce_x/8192/1/manual_time                    -0.2221         -0.1525         40875         31795         60573         51336
Gather/double_coalesce_x/16384/1/manual_time                   -0.2173         -0.1498         41056         32134         60631         51547
Gather/double_coalesce_x/32768/1/manual_time                   -0.5170         -0.4413         67973         32830         93277         52116
Gather/double_coalesce_x/65536/1/manual_time                   -0.2144         -0.1567         43154         33901         62330         52562
Gather/double_coalesce_x/131072/1/manual_time                  -0.2190         -0.1610         45139         35252         63802         53532
Gather/double_coalesce_x/262144/1/manual_time                  -0.1796         -0.1336         49142         40317         66700         57792
Gather/double_coalesce_x/524288/1/manual_time                  -0.1448         -0.1153         66965         57268         83863         74192
Gather/double_coalesce_x/1048576/1/manual_time                 -0.1144         -0.0940         84745         75048        104572         94738
Gather/double_coalesce_x/2097152/1/manual_time                 -0.0756         -0.0637        129918        120090        149744        140208
Gather/double_coalesce_x/4194304/1/manual_time                 -0.0482         -0.0442        211133        200956        231364        221128
Gather/double_coalesce_x/8388608/1/manual_time                 -0.0274         -0.0257        381666        371191        401358        391041
Gather/double_coalesce_x/16777216/1/manual_time                -0.0172         -0.0169        715312        703014        735392        722967
Gather/double_coalesce_x/33554432/1/manual_time                -0.0103         -0.0103       1385742       1371471       1405822       1391363
Gather/double_coalesce_x/67108864/1/manual_time                -0.0104         -0.0105       2742975       2714326       2763156       2734156
Gather/double_coalesce_x/1024/2/manual_time                    -0.1733         -0.1273         49524         40940         69564         60707
Gather/double_coalesce_x/2048/2/manual_time                    -0.1771         -0.1301         49582         40801         69514         60468
Gather/double_coalesce_x/4096/2/manual_time                    -0.2218         -0.1731         52703         41012         73221         60547
Gather/double_coalesce_x/8192/2/manual_time                    -0.1816         -0.1355         51293         41978         70950         61333
Gather/double_coalesce_x/16384/2/manual_time                   -0.2193         -0.1753         54360         42438         74558         61491
Gather/double_coalesce_x/32768/2/manual_time                   -0.2082         -0.1667         54656         43278         74339         61950
Gather/double_coalesce_x/65536/2/manual_time                   -0.2184         -0.1824         57593         45015         76367         62434
Gather/double_coalesce_x/131072/2/manual_time                  -0.1687         -0.1318         57243         47587         75106         65204
Gather/double_coalesce_x/262144/2/manual_time                  -0.1428         -0.1188         65898         56488         82826         72984
Gather/double_coalesce_x/524288/2/manual_time                  -0.0977         -0.0840         94131         84931        114061        104483
Gather/double_coalesce_x/1048576/2/manual_time                 -0.0765         -0.0668        130704        120700        150830        140753
Gather/double_coalesce_x/2097152/2/manual_time                 -0.1651         -0.1693        241447        201587        266641        221508
Gather/double_coalesce_x/4194304/2/manual_time                 -0.0370         -0.0366        367010        353439        387585        373393
Gather/double_coalesce_x/8388608/2/manual_time                 -0.0463         -0.0488        698669        666290        721442        686251
Gather/double_coalesce_x/16777216/2/manual_time                -0.0190         -0.0200       1310748       1285850       1331922       1305220
Gather/double_coalesce_x/33554432/2/manual_time                -0.0103         -0.0116       2551725       2525356       2574809       2545046
Gather/double_coalesce_x/67108864/2/manual_time                -0.0077         -0.0076       5052399       5013498       5071455       5032820
Gather/double_coalesce_x/1024/4/manual_time                    -0.1258         -0.0971         65293         57082         85152         76886
Gather/double_coalesce_x/2048/4/manual_time                    -0.1314         -0.1025         66836         58056         86580         77704
Gather/double_coalesce_x/4096/4/manual_time                    -0.1375         -0.1084         68618         59185         88189         78628
Gather/double_coalesce_x/8192/4/manual_time                    -0.1244         -0.0985         69209         60602         88567         79841
Gather/double_coalesce_x/16384/4/manual_time                   -0.1274         -0.1023         69878         60973         88767         79685
Gather/double_coalesce_x/32768/4/manual_time                   -0.1300         -0.1074         71739         62413         89585         79963
Gather/double_coalesce_x/65536/4/manual_time                   -0.1190         -0.0964         74570         65697         91701         82866
Gather/double_coalesce_x/131072/4/manual_time                  -0.1246         -0.1042         81384         71245         98254         88015
Gather/double_coalesce_x/262144/4/manual_time                  -0.0938         -0.0797         99592         90251        119523        109995
Gather/double_coalesce_x/524288/4/manual_time                  -0.0692         -0.0606        150809        140368        170768        160422
Gather/double_coalesce_x/1048576/4/manual_time                 -0.0464         -0.0416        222369        212056        242091        232029
Gather/double_coalesce_x/2097152/4/manual_time                 -0.0309         -0.0294        375601        363979        395799        384156
Gather/double_coalesce_x/4194304/4/manual_time                 -0.0225         -0.0219        677860        662615        697954        682665
Gather/double_coalesce_x/8388608/4/manual_time                 -0.0120         -0.0116       1283815       1268472       1303842       1288669
Gather/double_coalesce_x/16777216/4/manual_time                -0.0066         -0.0065       2485080       2468648       2504552       2488272
Gather/double_coalesce_x/33554432/4/manual_time                -0.0065         -0.0066       4906254       4874451       4926452       4893939
Gather/double_coalesce_x/67108864/4/manual_time                -0.0051         -0.0052       9738184       9688070       9758293       9707412
Gather/double_coalesce_x/1024/8/manual_time                    -0.0923         -0.0780         99750         90543        119460        110148
Gather/double_coalesce_x/2048/8/manual_time                    -0.0943         -0.0812        104464         94616        124136        114057
Gather/double_coalesce_x/4096/8/manual_time                    -0.0826         -0.0704        104342         95729        123625        114924
Gather/double_coalesce_x/8192/8/manual_time                    -0.0863         -0.0745        106355         97181        125424        116081
Gather/double_coalesce_x/16384/8/manual_time                   -0.0909         -0.0833        109639         99675        127917        117260
Gather/double_coalesce_x/32768/8/manual_time                   -0.1033         -0.0904        111931        100369        129199        117524
Gather/double_coalesce_x/65536/8/manual_time                   -0.0790         -0.0718        117680        108383        134615        124948
Gather/double_coalesce_x/131072/8/manual_time                  -0.0698         -0.0648        128666        119686        149236        139564
Gather/double_coalesce_x/262144/8/manual_time                  -0.0627         -0.0584        176862        165765        196982        185487
Gather/double_coalesce_x/524288/8/manual_time                  -0.0423         -0.0409        260787        249757        281275        269766
Gather/double_coalesce_x/1048576/8/manual_time                 -0.0621         -0.0643        429172        402535        451741        422675
Gather/double_coalesce_x/2097152/8/manual_time                 -0.0184         -0.0183        710338        697286        730524        717175
Gather/double_coalesce_x/4194304/8/manual_time                 -0.0121         -0.0122       1314264       1298322       1334627       1318292
Gather/double_coalesce_x/8388608/8/manual_time                 -0.0077         -0.0080       2526130       2506596       2546852       2526362
Gather/double_coalesce_x/16777216/8/manual_time                -0.0068         -0.0071       4936576       4902969       4957924       4922492
Gather/double_coalesce_x/33554432/8/manual_time                -0.0060         -0.0061       9764109       9705650       9784089       9724860
Gather/double_coalesce_x/67108864/8/manual_time                +0.0024         +0.0040      19377452      19423850      19395926      19472639
Gather/double_coalesce_o/1024/1/manual_time                    -0.2392         -0.1605         40288         30653         60137         50487
Gather/double_coalesce_o/2048/1/manual_time                    -0.2229         -0.1488         40499         31474         60259         51293
Gather/double_coalesce_o/4096/1/manual_time                    -0.2244         -0.1510         40255         31220         59939         50889
Gather/double_coalesce_o/8192/1/manual_time                    -0.2386         -0.1638         41493         31591         61076         51074
Gather/double_coalesce_o/16384/1/manual_time                   -0.2302         -0.1595         41620         32038         61060         51322
Gather/double_coalesce_o/32768/1/manual_time                   -0.2312         -0.1580         42084         32356         61113         51459
Gather/double_coalesce_o/65536/1/manual_time                   -0.2309         -0.1669         43309         33310         62104         51738
Gather/double_coalesce_o/131072/1/manual_time                  -0.2152         -0.1568         45496         35707         63771         53769
Gather/double_coalesce_o/262144/1/manual_time                  -0.1761         -0.1350         50637         41719         67984         58803
Gather/double_coalesce_o/524288/1/manual_time                  -0.1268         -0.1018         77047         67276         93533         84013
Gather/double_coalesce_o/1048576/1/manual_time                 -0.0656         -0.0607        139799        130622        157435        147879
Gather/double_coalesce_o/2097152/1/manual_time                 -0.0339         -0.0315        310673        300142        327832        317499
Gather/double_coalesce_o/4194304/1/manual_time                 -0.0130         -0.0126        745909        736213        763176        753529
Gather/double_coalesce_o/8388608/1/manual_time                 -0.0107         -0.0124       1692938       1674891       1712953       1691766
Gather/double_coalesce_o/16777216/1/manual_time                -0.0065         -0.0065       3573639       3550361       3590792       3567319
Gather/double_coalesce_o/33554432/1/manual_time                -0.0181         -0.0182       7436123       7301471       7453930       7318261
Gather/double_coalesce_o/67108864/1/manual_time                -0.0044         -0.0043      14884920      14819239      14902115      14837304
Gather/double_coalesce_o/1024/2/manual_time                    -0.2005         -0.1460         49442         39531         69426         59287
Gather/double_coalesce_o/2048/2/manual_time                    -0.2277         -0.1716         51763         39974         72190         59805
Gather/double_coalesce_o/4096/2/manual_time                    -0.2166         -0.1606         51292         40183         71093         59676
Gather/double_coalesce_o/8192/2/manual_time                    -0.2113         -0.1591         51906         40937         71570         60185
Gather/double_coalesce_o/16384/2/manual_time                   -0.2197         -0.1690         52719         41139         72230         60020
Gather/double_coalesce_o/32768/2/manual_time                   -0.2017         -0.1526         52711         42078         71603         60679
Gather/double_coalesce_o/65536/2/manual_time                   -0.1927         -0.1507         54591         44073         72393         61487
Gather/double_coalesce_o/131072/2/manual_time                  -0.1870         -0.1466         59596         48452         77229         65910
Gather/double_coalesce_o/262144/2/manual_time                  -0.1340         -0.1114         69884         60522         86743         77084
Gather/double_coalesce_o/524288/2/manual_time                  -0.0926         -0.0804        108312         98280        126445        116280
Gather/double_coalesce_o/1048576/2/manual_time                 -0.1730         -0.1738        272091        225021        293411        242413
Gather/double_coalesce_o/2097152/2/manual_time                 -0.1032         -0.1047        620392        556378        640418        573389
Gather/double_coalesce_o/4194304/2/manual_time                 -0.0101         -0.0100       1433137       1418717       1450168       1435703
Gather/double_coalesce_o/8388608/2/manual_time                 -0.0109         -0.0099       3305723       3269542       3319432       3286452
Gather/double_coalesce_o/16777216/2/manual_time                -0.1273         -0.1264       7984650       6968108       7994776       6984542
Gather/double_coalesce_o/33554432/2/manual_time                -0.0020         -0.0021      14398179      14368669      14415640      14384911
Gather/double_coalesce_o/67108864/2/manual_time                -0.0010         -0.0008      29210287      29181238      29225105      29200345
Gather/double_coalesce_o/1024/4/manual_time                    -0.1157         -0.0896         65460         57884         85237         77598
Gather/double_coalesce_o/2048/4/manual_time                    -0.1334         -0.1051         67124         58169         86838         77711
Gather/double_coalesce_o/4096/4/manual_time                    -0.1322         -0.1035         68776         59687         88221         79089
Gather/double_coalesce_o/8192/4/manual_time                    -0.1390         -0.1142         70076         60334         89617         79381
Gather/double_coalesce_o/16384/4/manual_time                   -0.2317         -0.2069         79393         60994        100311         79553
Gather/double_coalesce_o/32768/4/manual_time                   -0.1678         -0.1471         75173         62562         93681         79902
Gather/double_coalesce_o/65536/4/manual_time                   -0.2437         -0.2228         86798         65643        106518         82784
Gather/double_coalesce_o/131072/4/manual_time                  -0.1594         -0.1438         89554         75282        107471         92020
Gather/double_coalesce_o/262144/4/manual_time                  -0.1328         -0.1207        110766         96061        131076        115259
Gather/double_coalesce_o/524288/4/manual_time                  -0.1644         -0.1578        193707        161855        213742        180009
Gather/double_coalesce_o/1048576/4/manual_time                 -0.1602         -0.1590        494013        414850        513599        431942
Gather/double_coalesce_o/2097152/4/manual_time                 -0.2039         -0.2083       1341067       1067671       1370419       1084992
Gather/double_coalesce_o/4194304/4/manual_time                 -0.1348         -0.1357       3219329       2785257       3241492       2801606
Gather/double_coalesce_o/8388608/4/manual_time                 -0.1156         -0.1168       7312386       6467063       7341849       6484533
Gather/double_coalesce_o/16777216/4/manual_time                -0.1382         -0.1402      16057935      13838182      16093164      13837500
Gather/double_coalesce_o/33554432/4/manual_time                -0.1464         -0.1484      33459659      28562634      33518865      28545498
Gather/double_coalesce_o/67108864/4/manual_time                -0.1525         -0.1536      68465952      58022936      68487360      57968251
Gather/double_coalesce_o/1024/8/manual_time                    -0.1390         -0.1210        105500         90841        125697        110486
Gather/double_coalesce_o/2048/8/manual_time                    -0.3868         -0.3634        153916         94385        178698        113762
Gather/double_coalesce_o/4096/8/manual_time                    -0.3547         -0.3322        147806         95380        171262        114369
Gather/double_coalesce_o/8192/8/manual_time                    -0.3625         -0.3416        151157         96369        174958        115193
Gather/double_coalesce_o/16384/8/manual_time                   -0.3517         -0.3346        150529         97588        172980        115105
Gather/double_coalesce_o/32768/8/manual_time                   -0.3548         -0.3371        155101        100069        176852        117230
Gather/double_coalesce_o/65536/8/manual_time                   -0.3685         -0.3515        168954        106693        190043        123249
Gather/double_coalesce_o/131072/8/manual_time                  -0.3300         -0.3157        187773        125801        212423        145360
Gather/double_coalesce_o/262144/8/manual_time                  -0.2923         -0.2822        247248        174988        270806        194371
Gather/double_coalesce_o/524288/8/manual_time                  -0.2347         -0.2317        374475        286568        396719        304800
Gather/double_coalesce_o/1048576/8/manual_time                 -0.1865         -0.1882        986328        802362       1010052        819931
Gather/double_coalesce_o/2097152/8/manual_time                 -0.0397         -0.0401       2187582       2100684       2206334       2117788
Gather/double_coalesce_o/4194304/8/manual_time                 -0.0209         -0.0213       5659923       5541733       5679230       5558246
Gather/double_coalesce_o/8388608/8/manual_time                 -0.0246         -0.0248      13221763      12896366      13239906      12912104
Gather/double_coalesce_o/16777216/8/manual_time                -0.0525         -0.0526      29191054      27657719      29210728      27672969
Gather/double_coalesce_o/33554432/8/manual_time                -0.0815         -0.0816      62161763      57097037      62180671      57109769
Gather/double_coalesce_o/67108864/8/manual_time                -0.0266         -0.0270     119143801     115977739     119214918     115998559
```

Authors:
  - Mark Harris (https://github.com/harrism)
  - Devavret Makkar (https://github.com/devavret)
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - Jake Hemstad (https://github.com/jrhemstad)
  - Devavret Makkar (https://github.com/devavret)
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)
  - MithunR (https://github.com/mythrocks)

URL: #7758
There are a few classes that need `obj.foo` to behave like `obj['foo']`. They were implementing this independently, but getting it just right can be tricky, so this centralizes that logic into a single mixin.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Michael Wang (https://github.com/isVoid)

URL: #7845
Closes #7879, adds the ability to coerce an `int` or `Decimal` to a different `Decimal64Dtype` where possible and begins to plumb `pa.scalar` into some useful places within `cudf.Scalar`

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Keith Kraus (https://github.com/kkraus14)
  - Paul Taylor (https://github.com/trxcllnt)

URL: #7899
While debugging a GDS issue, found that all the other jni functions make this call before doing anything. Adding these doesn't actually fix the GDS problem, but it seems a prudent thing to do.

Authors:
  - Rong Ou (https://github.com/rongou)

Approvers:
  - Jason Lowe (https://github.com/jlowe)

URL: #7983
Removes `device_vector` in favour of either `device_uvector` or `device_buffer` as appropriate in parquet reader and writer.

Contributes to #7287 
Depends on #7758

Authors:
  - Devavret Makkar (https://github.com/devavret)
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Vukasin Milovanovic (https://github.com/vuule)
  - MithunR (https://github.com/mythrocks)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #7853
Update CPM with a [fix for `FETCHCONTENT_BASE_DIR`](cpm-cmake/CPM.cmake#244).

Authors:
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)

URL: #7982
Changes the `_global_set` union operation happening in `_is_supported()` to

```python
_global_set = _global_set.union(set(arg[col]))
```

Since `set.union()` doesn't actually modify the set in place. Before this PR, passing something like `{"a": ["unsupported_agg"]}` into `_is_supported()` would always return `True`.

cc @rjzamora

Authors:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
  - Richard (Rick) Zamora (https://github.com/rjzamora)
  - Keith Kraus (https://github.com/kkraus14)

URL: #7959
Fixes #5682.

- Structure `nvstrdesc_s` was replaced with `thrust::pair<const char*, size_type>;`.
- `nvstrdesc_s` related logical functions such as `nvstr_is_lesser`, `nvstr_is_greater` etc. were removed.
- Include directives for headers included by source files residing in the same directory were made relative as per the developer guide.
- `make_column` function related to `column_buffer` was moved from a header file to an implementation file.

Authors:
  - Kumar Aatish (https://github.com/kaatish)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - https://github.com/nvdbaranec
  - Devavret Makkar (https://github.com/devavret)
  - Keith Kraus (https://github.com/kkraus14)

URL: #7841
This reverts commit 3327f7b.

We have to revert this because the dependent project is broken and my system is in a broken state.

Authors:
  - Raza Jafri (https://github.com/razajafri)

Approvers:
  - Rong Ou (https://github.com/rongou)
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: #7987
This PR removes the `_char_width` member from the libcudf `cudf::string_view` class. This member was being used to record the character width of the bytes in the string but only if all the characters have the same width. This occurs when the string only contains ASCII encoded data which is all single-byte UTF-8 characters. The same optimization can be inferred when the existing `_length` and `_bytes` are equal. 
This change reduces the memory footprint of this class from 24 bytes to 16 bytes and therefore matches the size of `thrust::pair<char*,size_type>`. Using this class in a vector would thereby reduce the memory requirements for that vector by 1/3.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #7914
This PR updates the CUDA version used in the build scripts.

Authors:
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #7984
Resolves issue with ORC reader.

There were two issues,

There was a missing check to keep number of streams that needs to be accessed.
The position which was being used to calculate buffer length was wrong, and assigned non-zero value for a stream whose length is zero.

Authors:
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)

Approvers:
  - Ayush Dattagupta (https://github.com/ayushdg)
  - Paul Taylor (https://github.com/trxcllnt)
  - Devavret Makkar (https://github.com/devavret)

URL: #7988
This PR relaxes pandas version pinning which was introduced in `0.19`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Paul Taylor (https://github.com/trxcllnt)
  - Ray Douglass (https://github.com/raydouglass)
  - Keith Kraus (https://github.com/kkraus14)

URL: #7992
This PR add support for `lower_bound` and `upper_bound` binary searchs for structs column. This closes #7690.

In addition to adding binary search for structs, I also did some refactoring for `tests/search/search_test.cpp`, extracting dictionary search test from it. As such, basic search tests, dictionary search tests and (the new) struct search tests are put in separate source files. This is easier to access and future maintainance.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - David Wendt (https://github.com/davidwendt)
  - Keith Kraus (https://github.com/kkraus14)

URL: #7865
…` is non-numeric dtype (#7897)

Pandas interprets  `idx` in the expression `sr[idx]` as an absolute position in the series `sr` when `idx`'s `dtype` is different from that of `sr`'s `Index`. 

In Pandas, the indexing takes both an integer and a string as the index:

```
>>> import pandas as pd
>>> x = pd.Series([1,2,3], index=pd.Index(["a", "b", "c"]))
>>> x["b"]
2
>>> x[1]
2

```

Whereas cuDF  treats `idx `as a value to look up in `sr`'s Index, which can lead to different behaviors when indices have non-integral dtypes:

```
>>> import cudf
>>> x = cudf.Series([1,2,3], index=cudf.Index(["a", "b", "c"]))
>>> x["b"]
2
>>> x[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/series.py", line 921, in __getitem__
    return self.loc[arg]
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 120, in __getitem__
    raise KeyError(arg)
KeyError: 1

```







This PR fixes  the mismatch behavior in cuDF by deferring to `iloc` when a Series has a non-numerical Index and the indexer `idx `is an integer-like value ` : int, cudf Scalar, numpy int [np.int8, np.uint32, int64 `,,,] 






Fixes: #7622
Replaces: #7775

Authors:
  - Sheilah Kirui (https://github.com/skirui-source)

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - Keith Kraus (https://github.com/kkraus14)

URL: #7897
)

Refactor combine.cu to split out `join_strings()` and enable `concatenate()` to use `make_strings_children` utility.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #7937
[gpuCI] Forward-merge branch-0.19 to branch-0.20 [skip ci]
This PR is to add an API named `contiguousSplitGroups` in JNI which will split the groups in a table after a `groupby` operation, instead of executing an aggregate on each group, along with its unit tests.

This API will be used by some Spark operators ( e.g. Python UDFs ) to process the data group by group.

Other changes:

- Renames the `AggregateOperation` to `GroupByOperation` which sounds better, since it is retuned from exactly a `groupby` call.
- Adds some additional fields to `GroupByOptions` which will be used by native `groupby` to propably achieve a better performance.

Signed-off-by: Firestarman <[email protected]>

Authors:
  - Liangcai Li (https://github.com/firestarman)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Jason Lowe (https://github.com/jlowe)

URL: #7954
…#7843)

The null values in the position column didn't match up to expectations exactly. It can't be directly copied from the exploded column as the exploded column may contain null values that shouldn't be null in the position column.

Fixes #7787

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Jason Lowe (https://github.com/jlowe)
  - Nghia Truong (https://github.com/ttnghia)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #7843
…7772)

Introduces `make_optional_iterator` for nullable column and scalars, as the first step in fixing issues brought up in #6952 and #7573.

The iterator produces `thrust::optional<T>` to better represent nullable column elements and scalars.

`make_optional_iterator` supports three different `contains_null` modes:

    - `YES` means that the column supports nulls and has null values, therefore
     the optional might not contain a value

    - `NO` means that the column has no null values, therefore the optional will
     always have a value

    - `DYNAMIC` defers the assumption of nullability to runtime with the users stating
     on construction of the iterator if column has nulls.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Jake Hemstad (https://github.com/jrhemstad)
  - Paul Taylor (https://github.com/trxcllnt)
  - David Wendt (https://github.com/davidwendt)

URL: #7772
Added support for Decimal/fixed-point column in ORC reader along with test cases. All decimal columns would be read as Decimal64 type column, and if precision is >18, it will loudly fail. This PR also remove couple of options which are of no use after the addition of Decimal support.

#7126

Authors:
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)

Approvers:
  - Devavret Makkar (https://github.com/devavret)
  - Vukasin Milovanovic (https://github.com/vuule)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #7970
The `cudf::strings::replace_nulls()` is a public API that was replaced by `cudf::replace_nulls()`. 
The strings one should not be used since the base one handles any column type.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Conor Hoekstra (https://github.com/codereport)
  - Nghia Truong (https://github.com/ttnghia)
  - Christopher Harris (https://github.com/cwharris)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #7965
closes #4882

Added groupby.product support in both hash and sort groupby.

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Jake Hemstad (https://github.com/jrhemstad)
  - https://github.com/brandon-b-miller

URL: #7763
`DOCKER_IMAGE` is out of date since #7953 was merged. This fixes that.

Authors:
  - Conor Hoekstra (https://github.com/codereport)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #8013
Closes #6836 

Overall reduces test execution time and improves coverage:

- Replace around 2.5K test cases with multiple tests that only vary related options.
- Correctly verify the output with multicharacter `line_terminator` option (not supported by readers).
- Add a `seed` call before the random generator is used in one of the tests.
- Simplify a few tests by removing irrelevant comparisons.
- Use buffer output instead of file in affected tests (could be applied to many more).

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - https://github.com/brandon-b-miller

URL: #7851
This enables the quantile method for columns of type `decimal`

Authors:
  - https://github.com/ChrisJar

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #7927
closes #7628 

This PR adds support to setting a column in the dataframe when the provided column name is a new column name. The specified rows can be of a single row label, a collection of row labels, or slices. The value-to-set can be column-like object or scalar. E.g. you can now do this:

```
>>> x = cudf.DataFrame()
>>> x.loc[:, "a"] = [1, 2, 3] # set a new column with list
>>> x
   a
0  1
1  2
2  3
>>> x.loc[[1, 2], "b"] = ["abc", "cba"] # set part of the new column with list
>>> x
   a     b
0  1  <NA>
1  2   abc
2  3   cba
>>> x.loc[:, "c"] = 5 # set the new column to the scalar
>>> x
   a     b  c
0  1  <NA>  5
1  2   abc  5
2  3   cba  5
```

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #8012
Closes #8011 

Dask-cuDF currently reads a single stripe to infer metadata in `read_orc`.  When the first path corresponds to an empty file, there is no stripe "0" to read.  This PR includes a simple fix (and test coverage).

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)

URL: #8021
…/writer (#7805)

Issue #7287

Replaces `device_vector` with `device_uvector` and `device_span`. Removed the `device_vector` data members.

Performance impact:

- Writer: None
- Reader: ~up to 10% slower, will look into this.~ None

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Devavret Makkar (https://github.com/devavret)
  - Kumar Aatish (https://github.com/kaatish)
  - Mark Harris (https://github.com/harrism)

URL: #7805
nvdbaranec and others added 6 commits May 27, 2021 01:10
Fixes:  #8323

Also fixes a recently introduced bug in the test column equality checker.  The code was previously relying on accesses to device memory being transparently handled by `thrust::device_vector`

Authors:
  - https://github.com/nvdbaranec

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Devavret Makkar (https://github.com/devavret)
  - Nghia Truong (https://github.com/ttnghia)

URL: #8350
Fixes: #8200 
This PR adds support for merging b/w categorical data by implementing `union_categoricals_dispatch` in `dask-cudf`. This PR is dependent on dask upstream changes: dask/dask#7699

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Ashwin Srinath (https://github.com/shwina)

URL: #8332
This fixes an issue where the precision assigned to the result of a decimal binary operation may exceed the maximum precision.

Closes #8291

Authors:
  - https://github.com/ChrisJar

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - Christopher Harris (https://github.com/cwharris)
  - Nghia Truong (https://github.com/ttnghia)

URL: #8194
Fixes the rolling-window part of #7611.

All the rolling window functions return empty results when the input aggregation column is empty, just as they should. But the column types are incorrectly set to match the input type. While this is alright for `[MIN(), MAX(), LEAD(), LAG()]`, it is incorrect for some aggregations:
Aggregation   |     Input Types      |           Output Type             |
--------------|----------------------|-----------------------------------|
COUNT_VALID   | All types            | INT32                             |
COUNT_ALL     | All types            | INT32                             |
ROW_NUMBER    | All types            | INT32                             |
SUM           | Numerics (e.g. INT8) | 64-bit promoted type (e.g. INT64) |
SUM           | Chrono               | Same as input type                |
SUM           | All else             | Unsupported                       |
MEAN          | Numerics             | FLOAT64                           |
MEAN          | Chrono               | FLOAT64                           |
MEAN          | All else             | Unsupported                       |
COLLECT_LIST  | All types T          | LIST with child of type T         |

This mapping is congruent with `cudf::target_type_t` from `<cudf/detail/aggregation/aggregation.hpp>`.

This commit corrects the type of the output column that results from an empty input. It adds test for all the combinations listed above.

Note: This is dependent on #8158, and should be merged after that is committed.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - https://github.com/nvdbaranec
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #8274
Allow installing Dask + Distributed 2021.05.1 to be installed.

~~This isn't released yet, but this tees things up so we are ready to go once it comes out.~~

Associated integration PR ( rapidsai/integration#284 )

Authors:
  - https://github.com/jakirkham

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Benjamin Zaitlen (https://github.com/quasiben)
  - Christopher Harris (https://github.com/cwharris)
  - Jordan Jacobelli (https://github.com/Ethyling)

URL: #8392
@raydouglass raydouglass requested review from a team as code owners June 1, 2021 16:16
@github-actions github-actions bot added CMake CMake build issue conda Java Affects Java cuDF API. Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Jun 1, 2021
@codecov
Copy link

codecov bot commented Jun 1, 2021

Codecov Report

Merging #8418 (854176b) into main (599f62d) will increase coverage by 0.53%.
The diff coverage is n/a.

❗ Current head 854176b differs from pull request most recent head 85f04d0. Consider uploading reports for the commit 85f04d0 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #8418      +/-   ##
==========================================
+ Coverage   82.30%   82.83%   +0.53%     
==========================================
  Files         101      109       +8     
  Lines       17053    17896     +843     
==========================================
+ Hits        14035    14824     +789     
- Misses       3018     3072      +54     
Impacted Files Coverage Δ
python/cudf/cudf/utils/dtypes.py 82.87% <0.00%> (-7.02%) ⬇️
python/cudf/cudf/core/column/struct.py 95.08% <0.00%> (-4.92%) ⬇️
python/cudf/cudf/utils/gpu_utils.py 53.65% <0.00%> (-4.88%) ⬇️
python/cudf/cudf/core/column/decimal.py 89.43% <0.00%> (-4.42%) ⬇️
python/cudf/cudf/core/tools/datetimes.py 80.42% <0.00%> (-4.02%) ⬇️
python/dask_cudf/dask_cudf/backends.py 87.01% <0.00%> (-2.62%) ⬇️
python/cudf/cudf/core/groupby/groupby.py 91.04% <0.00%> (-2.41%) ⬇️
python/cudf/cudf/core/column/datetime.py 87.77% <0.00%> (-1.43%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.09% <0.00%> (-0.93%) ⬇️
python/dask_cudf/dask_cudf/groupby.py 91.28% <0.00%> (-0.88%) ⬇️
... and 63 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab3b3f6...85f04d0. Read the comment docs.

msadang and others added 2 commits June 1, 2021 15:14
@raydouglass raydouglass merged commit f9d5e2e into main Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.