Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Allow merging index column with data column using keyword "on" #6453

Closed

Conversation

skirui-source
Copy link
Contributor

No description provided.

@skirui-source skirui-source requested a review from a team as a code owner October 7, 2020 01:45
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@codecov
Copy link

codecov bot commented Oct 7, 2020

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@6c9114e). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.19    #6453   +/-   ##
==============================================
  Coverage               ?   82.95%           
==============================================
  Files                  ?       95           
  Lines                  ?    14919           
  Branches               ?        0           
==============================================
  Hits                   ?    12376           
  Misses                 ?     2543           
  Partials               ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c9114e...f2c9f57. Read the comment docs.

@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels Oct 9, 2020
@kkraus14 kkraus14 changed the base branch from branch-0.16 to branch-0.19 February 2, 2021 22:08
harrism and others added 7 commits February 3, 2021 04:25
Adds a new developer guide for libcudf. This is based on the
existing libcudf++ transition guide.

Fixes rapidsai#5273 

TODO

- [x] Description of `dictionary_column_wrapper` and `fixed_point_column_wrapper`
- [x] Benchmarking Section (put in a new file, Benchmarking.md)?
- [x] Better discussion of nested types
- [x] Introductory section on data types
- [x] Consider splitting into multiple documents: DEVELOPER_GUIDE.md, TESTING.md, BENCHMARKING.md?
- [x] Placeholder for cuIO?
- [x] Add section on code and documentation style and formatting

Authors:
  - Mark Harris (@harrism)
  - Jake Hemstad (@jrhemstad)

Approvers:
  - @nvdbaranec
  - Conor Hoekstra (@codereport)
  - Jake Hemstad (@jrhemstad)
  - David (@davidwendt)

URL: rapidsai#6977
[gpuCI] Auto-merge branch-0.18 to branch-0.19 [skip ci]
NumPy 1.20 is [typed](https://numpy.org/devdocs/release/1.20.0-notes.html#numpy-is-now-typed), which exposed a few typing errors in cuDF that this PR addresses.

Authors:
  - Ashwin Srinath (@shwina)

Approvers:
  - Keith Kraus (@kkraus14)
  - GALI PREM SAGAR (@galipremsagar)
  - AJ Schmidt (@ajschmidt8)

URL: rapidsai#7279
[gpuCI] Auto-merge branch-0.18 to branch-0.19 [skip ci]
This PR prepares the changelog to be automatically updated during releases.

Authors:
  - AJ Schmidt (@ajschmidt8)

Approvers:
  - Keith Kraus (@kkraus14)

URL: rapidsai#7272
abellina and others added 19 commits March 4, 2021 16:44
Add synchronization in `cleanImpl` and `close` in various places where race conditions could exist, and also within the `MemoryCleaner` to address some concurrent modification issues we've seen in tests while shutting down (i.e. invoking the cleaner) (i.e. NVIDIA/spark-rapids#1797)

Authors:
  - Alessandro Bellina (@abellina)

Approvers:
  - Robert (Bobby) Evans (@revans2)
  - Jason Lowe (@jlowe)

URL: rapidsai#7474
Final step, closes rapidsai#5133

Authors:
  - Devavret Makkar (@devavret)

Approvers:
  - Nghia Truong (@ttnghia)
  - Vukasin Milovanovic (@vuule)

URL: rapidsai#7510
Reference rapidsai#7285 

This PR adds Cython wrappers for `cudf::strings::to_fixed_point`, `cudf::strings::from_fixed_point`, and `cudf::strings::is_fixed_point` libcudf functions.

Authors:
  - David (@davidwendt)

Approvers:
  - GALI PREM SAGAR (@galipremsagar)
  - Ashwin Srinath (@shwina)
  - Conor Hoekstra (@codereport)

URL: rapidsai#7429
This PR is to support skipping nulls for `collect ` aggregation in JVM by creating a new class `CollectAggregation` who accepts a `NullPolicy ` argument indicating whether to include nulls. 

Skipping nulls has already been supported by `collect ` aggregation with rolling in native (rapidsai#7264), so this PR just exposes the feaure in JVM.

This PR also introduces `NullPolicy ` and updates the related aggregates.

Signed-off-by: firestarman <[email protected]>

Authors:
  - Liangcai Li (@firestarman)

Approvers:
  - Robert (Bobby) Evans (@revans2)
  - MithunR (@mythrocks)

URL: rapidsai#7457
…ake (rapidsai#7518)

Rename `ARROW_STATIC_LIB` because it conflicts with CMake variable in Arrow's `FindArrow.cmake`.

Here's the new way to statically link Arrow with libcudf:
```
cmake -D CUDF_USE_ARROW_STATIC=ON ..
```

Authors:
  - Paul Taylor (@trxcllnt)

Approvers:
  - Keith Kraus (@kkraus14)

URL: rapidsai#7518
Addresses rapidsai#7347

Authors:
  - Kumar Aatish (@kaatish)

Approvers:
  - David (@davidwendt)
  - Devavret Makkar (@devavret)
  - Vukasin Milovanovic (@vuule)

URL: rapidsai#7439
This updates the Java build scripts and documentation to use the new CUDF_USE_ARROW_STATIC flag after the rename from ARROW_STATIC_LIB in rapidsai#7518.

Authors:
  - Jason Lowe (@jlowe)

Approvers:
  - Alessandro Bellina (@abellina)
  - Keith Kraus (@kkraus14)

URL: rapidsai#7526
This changes the root directory of the build folder for conda. Instead of generating a random build folder name, it will create a consistent build folder name at the `croot` location. This folder name is unique in CI, as every build has a unique `${WORKSPACE}` that is used.

Lots of workarounds added to properly work with Project Flash. Several `mv` commands are added to put build artifacts in a folder Project Flash expects them to be in.

Authors:
  - Dillon Cullinan (@dillon-cullinan)

Approvers:
  - AJ Schmidt (@ajschmidt8)

URL: rapidsai#7508
This PR adds a couple of very specialized methods that help us cast columns inside nested types.

Authors:
  - Raza Jafri (@razajafri)

Approvers:
  - Robert (Bobby) Evans (@revans2)
  - Jason Lowe (@jlowe)
  - MithunR (@mythrocks)

URL: rapidsai#7417
Refactors the bitmask merging functionality to support any binary function, allowing for `bitwise_or` support in addition the existing `bitwise_and` support. Includes changes to the Java api and JNI to access the `bitwise_or` functionality.

Authors:
  - @rwlee

Approvers:
  - Jason Lowe (@jlowe)
  - Jake Hemstad (@jrhemstad)
  - Christopher Harris (@cwharris)

URL: rapidsai#7406
Closes rapidsai#7320 

This PR adds an additional preprocessing step in documentation generation. It traverses through the doctree generated by Sphinx and replaces unresolved type short hands with proper target reference, while keeping the shortened name for display text.

An additional preprocessing step is added to ignore internal types to APIs facing both internally and externally, such as `cudf.core.column.string.StringColumn`

`cupy` API reference is added to intersphinx.

Minor changes:
- Fixes a small doc bug in `frame.copy`

Authors:
  - Michael Wang (@isVoid)

Approvers:
  - Ashwin Srinath (@shwina)

URL: rapidsai#7416
`dask` and `distributed` are changing their default branches name from `master` to `main`, this will break our dev environments and CI, this PR updates the required files. 

`distributed` already merged the PR that does the change, `dask` will probably do the same very soon so a PR that updates both seems to be the best approach.

Authors:
  - Dante Gama Dessavre (@dantegd)

Approvers:
  - Keith Kraus (@kkraus14)
  - AJ Schmidt (@ajschmidt8)

URL: rapidsai#7532
Reference rapidsai#5698
This creates a gbenchmark for `cudf::strings::extract` function. The benchmarks measures various sized rows as well as strings lengths. It also has measurements for small, medium, and large regex instructions. The extract performance is effected by the number of instructions in the regex pattern.

Authors:
  - David (@davidwendt)

Approvers:
  - Keith Kraus (@kkraus14)
  - Karthikeyan (@karthikeyann)
  - Mark Harris (@harrism)

URL: rapidsai#7522
This PR reduces the number of calls to `inclusive_scan` and `exclusive_scan` by using a `null_replace_accessor` that allows non-nullable columns. This reduces the compile time and size of `scan.cu` by half. This PR also includes a scan gbenchmark that shows no change in performance from the original implementation.

Authors:
  - David (@davidwendt)

Approvers:
  - Paul Taylor (@trxcllnt)
  - Jake Hemstad (@jrhemstad)

URL: rapidsai#7516
…i#7535)

There were a few renames of master --> main that were missed for the recent dask branch rename, fixed them.

Authors:
  - Keith Kraus (@kkraus14)

Approvers:
  - AJ Schmidt (@ajschmidt8)
  - GALI PREM SAGAR (@galipremsagar)

URL: rapidsai#7535
Fix for issue caused by stale PR issue from rapidsai#7406

Authors:
  - @rwlee
  - Keith Kraus (@kkraus14)

Approvers:
  - Keith Kraus (@kkraus14)
  - Mike Wilson (@hyperbolic2346)
  - GALI PREM SAGAR (@galipremsagar)
  - Jake Hemstad (@jrhemstad)
  - Vukasin Milovanovic (@vuule)
  - Paul Taylor (@trxcllnt)

URL: rapidsai#7533
)

Presume that a project is using `cudf` via CPM like the following, and the machine doesn't have cudf installed, but does have rmm.
```
CPMAddPackage(NAME  cudf
        VERSION         "0.19.0"
        GIT_REPOSITORY  https://github.com/rapidsai/cudf.git
        GIT_TAG         branch-0.19
        GIT_SHALLOW     TRUE
        SOURCE_SUBDIR   cpp
        OPTIONS         "BUILD_TESTS OFF"
                        "BUILD_BENCHMARKS OFF"
                        "ARROW_STATIC_LIB ON"
                        "JITIFY_USE_CACHE ON"
                        "CUDA_STATIC_RUNTIME ON"
                        "DISABLE_DEPRECATION_WARNING ON"
                        "AUTO_DETECT_CUDA_ARCHITECTURES ON"
    )

add_library(cudf_example cudf_example.cu)
target_link_libraries(cudf_example PRIVATE cudf::cudf)

add_library(rmm_example rmm_example.cu)
target_link_libraries(rmm_example PRIVATE rmm::rmm)
```

While CPM will fail to find `cudf`, it will find the local install of `rmm` and use it. This poses a problem as CMake import targets have different default visibility compared to 'real' targets. This means that while `cudf::cudf` can see and resolve `rmm::rmm` the `rmm_example` executable won't be able to.

This change makes it possible for users of cudf via CPM to directly access the `rmm::rmm` target

Authors:
  - Robert Maynard (@robertmaynard)

Approvers:
  - Keith Kraus (@kkraus14)

URL: rapidsai#7524
@github-actions github-actions bot added CMake CMake build issue conda Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Mar 10, 2021
@skirui-source skirui-source marked this pull request as draft March 10, 2021 03:40
@skirui-source skirui-source deleted the mergeindexondata branch March 11, 2021 05:58
@skirui-source
Copy link
Contributor Author

skirui-source commented Mar 26, 2021

replaced by PR 7569

@skirui-source skirui-source self-assigned this Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress CMake CMake build issue Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.