Skip to content

Commit

Permalink
Merge pull request #2 from rapidsai/branch-0.9
Browse files Browse the repository at this point in the history
Changes
  • Loading branch information
rgsl888prabhu authored Aug 5, 2019
2 parents 89a2f68 + 511b75c commit 7c1496a
Show file tree
Hide file tree
Showing 226 changed files with 11,467 additions and 3,839 deletions.
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ cudf.egg-info/
python/build
python/*/build
python/cudf/*/bindings/**/*.cpp
python/cudf/*/bindings/**/*.h
python/cudf/*/bindings/.nfs*
python/*.ipynb
python/.ipynb_checkpoints
python/cudf/*.ipynb
python/cudf/.ipynb_checkpoints
.Python
env/
develop-eggs/
Expand Down
4 changes: 0 additions & 4 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
path = thirdparty/cub
url = https://github.com/rapidsai/thirdparty-cub.git
branch = cudf-cub
[submodule "thirdparty/dlpack"]
path = thirdparty/dlpack
url = https://github.com/rapidsai/dlpack.git
branch=cudf
[submodule "thirdparty/jitify"]
path = thirdparty/jitify
url = https://github.com/rapidsai/jitify.git
Expand Down
37 changes: 34 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@

- PR #2111 IO Readers: Support memory buffer, file-like object, and URL inputs
- PR #2012 Add `reindex()` to DataFrame and Series
- PR #2097 Add GPU-accelerated AVRO reader
- PR #2098 Align DataFrame and Series indices before executing binary ops
- PR #2160 Merge `dask-cudf` codebase into `cudf` repo
- PR #2149 CSV Reader: Add `hex` dtype for explicit hexadecimal parsing
- PR #2156 Add `upper_bound()` and `lower_bound()` for libcudf tables and `searchsorted()` for cuDF Series
- PR #2158 CSV Reader: Support single, non-list/dict argument for `dtype`
- PR #2177 CSV Reader: Add `parse_dates` parameter for explicit date inference
- PR #2171 Add CodeCov integration, fix doc version, make --skip-tests work when invoking with source
Expand All @@ -18,9 +20,18 @@
- PR #2105 Add google benchmark for hash-based join
- PR #2293 Improve `compute_join_output_size` performance
- PR #2316 Unique, nunique, and value_counts for datetime columns
- PR #2337 Add Java support for slicing a ColumnVector
- PR #2049 Implemented merge functionality
- PR #2368 Full cudf+dask Parquet Support
- PR #2380 New cudf::is_sorted checks whether cudf::table is sorted
- PR #2356 Java column vector standard deviation support
- PR #2221 MultiIndex Full Indexing - Support iloc and wildcards for loc
- PR #2429 Java column vector: added support for getting length of strings in a ColumnVector
- PR #2415 Revamp `value_counts` to use groupby count series of any type
- PR #2446 Add __array_function__ for index
- PR #2437 ORC reader: Add 'use_np_dtypes' option
- PR #2382 Add CategoricalAccessor add, remove, rename, and ordering methods
- PR #2449 Java column vector: added support for getting byte count of strings in a ColumnVector

## Improvements

Expand Down Expand Up @@ -54,15 +65,24 @@
- PR #2309 Java readers: remove redundant copy of result pointers
- PR #2307 Add `black` and `isort` to style checker script
- PR #2345 Restore removal of old groupby implementation
- PR #2342 Improve `astype()` to operate all ways
- PR #2329 using libcudf cudf::copy for column deep copy
- PR #2344 Add docs on how code formatting works for contributors
- PR #2353 Bump Arrow and Dask versions
- PR #2377 Replace `standard_python_slice` with just `slice.indices()`
- PR #2373 cudf.DataFrame enchancements & Series.values support
- PR #2392 Remove dlpack submodule; make cuDF's Cython API externally accessible
- PR #2430 Updated Java bindings to use the new unary API
- PR #2406 Moved all existing `table` related files to a `legacy/` directory
- PR #2350 Performance related changes to get_dummies
- PR #2420 Remove `cudautils.astype` and replace with `typecast.apply_cast`
- PR #2458 Fix handling of thirdparty packages in `isort` config

## Bug Fixes

- PR #2086 Fixed quantile api behavior mismatch in series & dataframe
- PR #2128 Add offset param to host buffer readers in java API.
- PR #2145 Work around binops validity checks for java
- PR #2145 Work around binops validity checks for java
- PR #2146 Work around unary_math validity checks for java
- PR #2151 Fixes bug in cudf::copy_range where null_count was invalid
- PR #2139 matching to pandas describe behavior & fixing nan values issue
Expand All @@ -80,10 +100,12 @@
- PR #2244 Fix ORC RLEv2 delta mode decoding with nonzero residual delta width
- PR #2297 Work around `var/std` unsupported only at debug build
- PR #2302 Fixed java serialization corner case
- PR #2355 Handle float16 in binary operations
- PR #2311 Fix copy behaviour for GenericIndex
- PR #2349 Fix issues with String filter in java API
- PR #2323 Fix groupby on categoricals
- PR #2328 Ensure order is preserved in CategoricalAccessor._set_categories
- PR #2202 Fix issue with unary ops mishandling empty input
- PR #2326 Fix for bug in DLPack when reading multiple columns
- PR #2324 Fix cudf Docker build
- PR #2325 Fix ORC RLEv2 patched base mode decoding with nonzero patch width
Expand All @@ -94,6 +116,13 @@
- PR #2364 Fix quantile api and other trivial issues around it
- PR #2361 Fixed issue with `codes` of CategoricalIndex
- PR #2357 Fixed inconsistent type of index created with from_pandas vs direct construction
- PR #2389 Fixed Rolling __getattr__ and __getitem__ for offset based windows
- PR #2402 Fixed bug in valid mask computation in cudf::copy_if (apply_boolean_mask)
- PR #2401 Fix to a scalar datetime(of type Days) issue
- PR #2386 Correctly allocate output valids in groupby
- PR #2411 Fixed failures on binary op on single element string column
- PR #2422 Fix Pandas logical binary operation incompatibilites
- PR #2447 Fix CodeCov posting build statuses temporarily


# cuDF 0.8.0 (27 June 2019)
Expand Down Expand Up @@ -128,8 +157,9 @@
- PR #1995 Add Java API
- PR #1998 Add google benchmark to cudf
- PR #1845 Add cudf::drop_duplicates, DataFrame.drop_duplicates
- PR #1652 Added `Series.where()` feature
- PR #2074 Java Aggregates, logical ops, and better RMM support
- PR #1652 Added `Series.where()` feature
- PR #2074 Java Aggregates, logical ops, and better RMM support
- PR #2140 Add a `cudf::transform` function

## Improvements

Expand Down Expand Up @@ -544,6 +574,7 @@
- PR #1183 Bump Arrow version to 0.12.1
- PR #1208 Default to CXX11_ABI=ON
- PR #1252 Fix NVStrings dependencies for cuda 9.2 and 10.0
- PR #2037 Optimize the existing `gather` and `scatter` routines in `libcudf`

## Bug Fixes

Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ flag. Below is a list of the available arguments and their purpose:
| `NUMBA_VERSION` | newest | >=0.40.0 | set numba version |
| `NUMPY_VERSION` | newest | >=1.14.3 | set numpy version |
| `PANDAS_VERSION` | newest | >=0.23.4 | set pandas version |
| `PYARROW_VERSION` | 0.12.1 | Not supported | set pyarrow version |
| `PYARROW_VERSION` | 0.14.1 | Not supported | set pyarrow version |
| `CMAKE_VERSION` | newest | >=3.12 | set cmake version |
| `CYTHON_VERSION` | 0.29 | Not supported | set Cython version |
| `PYTHON_VERSION` | 3.6 | 3.7 | set python version |
Expand Down
13 changes: 4 additions & 9 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,10 @@ nvidia-smi
logger "Activate conda env..."
source activate gdf
conda install "rmm=$MINOR_VERSION.*" "nvstrings=$MINOR_VERSION.*" "cudatoolkit=$CUDA_REL" \
"dask>=2.0" "distributed>=2.0" "numpy>=1.16"
"dask>=2.1.0" "distributed>=2.1.0" "numpy>=1.16" "double-conversion" \
"rapidjson" "flatbuffers" "boost-cpp" "fsspec>=0.3.3" "dlpack" \
"feather-format" "cupy>=6.0.0" "arrow-cpp=0.14.1" "pyarrow=0.14.1" \
"fastavro>=0.22.0" "pandas>=0.24.2,<0.25"

# Install the master version of dask and distributed
logger "pip install git+https://github.com/dask/distributed.git --upgrade --no-deps"
Expand Down Expand Up @@ -78,14 +81,6 @@ else
cd $WORKSPACE/cpp/build
GTEST_OUTPUT="xml:${WORKSPACE}/test-results/" make -j${PARALLEL_LEVEL} test

# Install the master version of distributed for serialization testing
logger "pip install git+https://github.com/dask/distributed.git"
pip install "git+https://github.com/dask/distributed.git"

# Temporarily install feather and cupy for testing
logger "conda install feather-format"
conda install "feather-format" "cupy>=6.0.0"

# set environment variable for numpy 1.16
# will be enabled for later versions by default
np_ver=$(python -c "import numpy; print('.'.join(numpy.__version__.split('.')[:-1]))")
Expand Down
5 changes: 5 additions & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#Configuration File for CodeCov
coverage:
status:
project: off
patch: off
18 changes: 12 additions & 6 deletions conda/environments/cudf_dev_cuda10.0.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ dependencies:
- rmm=0.9.*
- cmake>=3.12
- python>=3.6,<3.8
- numba>=0.41
- pandas>=0.23.4
- pyarrow=0.12.1
- numba>=0.41,<0.45
- pandas>=0.24.2,<0.25
- pyarrow=0.14.1
- fastavro>=0.22.0
- notebook>=0.5.0
- boost
- cython>=0.29,<0.30
- pytest
- sphinx
Expand All @@ -36,8 +36,14 @@ dependencies:
- black
- isort
- pre_commit
- dask>=2.0
- distributed>=2.0
- dask>=2.1.0
- distributed>=2.1.0
- dlpack
- arrow-cpp=0.14.1
- boost-cpp
- double-conversion
- rapidjson
- flatbuffers
- pip:
- sphinx-markdown-tables
- git+https://github.com/dask/dask.git
Expand Down
18 changes: 12 additions & 6 deletions conda/environments/cudf_dev_cuda9.2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ dependencies:
- rmm=0.9.*
- cmake>=3.12
- python>=3.6,<3.8
- numba>=0.41
- pandas>=0.23.4
- pyarrow=0.12.1
- numba>=0.41,<0.45
- pandas>=0.24.2,<0.25
- pyarrow=0.14.1
- fastavro>=0.22.0
- notebook>=0.5.0
- boost
- cython>=0.29,<0.30
- pytest
- sphinx
Expand All @@ -36,8 +36,14 @@ dependencies:
- black
- isort
- pre_commit
- dask>=2.0
- distributed>=2.0
- dask>=2.1.0
- distributed>=2.1.0
- dlpack
- arrow-cpp=0.14.1
- boost-cpp
- double-conversion
- rapidjson
- flatbuffers
- pip:
- sphinx-markdown-tables
- git+https://github.com/dask/dask.git
Expand Down
13 changes: 9 additions & 4 deletions conda/recipes/cudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,21 @@ requirements:
- python
- cython >=0.29,<0.30
- setuptools
- numba >=0.41
- numba >=0.41,<0.45
- dlpack
- pyarrow 0.14.1.*
- libcudf {{ version }}
- dlpack
run:
- python
- pandas >=0.23.4
- numba >=0.41
- pyarrow 0.12.1.*
- pandas>=0.24.2,<0.25
- numba >=0.41,<0.45
- pyarrow 0.14.1.*
- fastavro >=0.22.0
- rmm {{ minor_version }}.*
- nvstrings {{ minor_version }}.*
- cython >=0.29,<0.30
- dlpack

test:
commands:
Expand Down
8 changes: 4 additions & 4 deletions conda/recipes/dask-cudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ requirements:
host:
- python
- cudf {{ version }}
- dask >=2.0.0
- distributed >=2.0.0
- dask >=2.1.0
- distributed >=2.1.0
run:
- python
- cudf {{ version }}
- dask >=2.0.0
- distributed >=2.0.0
- dask >=2.1.0
- distributed >=2.1.0
test:
imports:
- dask_cudf
Expand Down
10 changes: 9 additions & 1 deletion conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,16 @@ requirements:
- librmm {{ minor_version }}.*
- libnvstrings {{ minor_version }}.*
- cudatoolkit {{ cuda_version }}.*
- arrow-cpp 0.14.1.*
- double-conversion
- rapidjson
- flatbuffers
- boost-cpp
- dlpack
run:
- {{ pin_compatible('cudatoolkit', max_pin='x.x') }}
- arrow-cpp 0.14.1.*
- dlpack

test:
commands:
Expand All @@ -45,7 +53,7 @@ test:
- test -f $PREFIX/include/cudf/stream_compaction.hpp
- test -f $PREFIX/include/cudf/copying.hpp
- test -f $PREFIX/include/cudf/io_functions.hpp
- test -f $PREFIX/include/cudf/table.hpp
- test -f $PREFIX/include/cudf/legacy/table.hpp
- test -f $PREFIX/include/cudf/cudf.h
- test -f $PREFIX/include/cudf/io_readers.hpp
- test -f $PREFIX/include/cudf/types.h
Expand Down
Loading

0 comments on commit 7c1496a

Please sign in to comment.