From 66d457225145c0ab3df587db36ebe206990defc4 Mon Sep 17 00:00:00 2001
From: rjzamora <rzamora217@gmail.com>
Date: Tue, 3 Nov 2020 20:38:09 -0800
Subject: [PATCH 1/2] remove dask-cudf documentation and add README page

---
 docs/cudf/source/dask-cudf.md | 78 -----------------------------------
 docs/cudf/source/index.rst    |  1 -
 python/dask_cudf/README.md    | 12 ++++++
 3 files changed, 12 insertions(+), 79 deletions(-)
 delete mode 100644 docs/cudf/source/dask-cudf.md
 create mode 100644 python/dask_cudf/README.md
diff --git a/docs/cudf/source/dask-cudf.md b/docs/cudf/source/dask-cudf.md
deleted file mode 100644
index 92ef4eb1c46..00000000000
--- a/docs/cudf/source/dask-cudf.md
+++ /dev/null
@@ -1,78 +0,0 @@
-Multi-GPU with Dask-cuDF
-========================
-
-cuDF is a single-GPU library. For Multi-GPU cuDF solutions we use [Dask](https://dask.org/) and the [dask-cudf package](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf), which is able to scale cuDF across multiple GPUs on a single machine, or multiple GPUs across many machines in a cluster.
-
-[Dask DataFrame](http://docs.dask.org/en/latest/dataframe.html) was originally designed to scale Pandas, orchestrating many Pandas DataFrames spread across many CPUs into a cohesive parallel DataFrame. Because cuDF currently implements only a subset of Pandas’s API, not all Dask DataFrame operations work with cuDF.
-
-The following is tested and expected to work:
-
-What works
-----------
-
--  Data ingestion
-    -  ``dask_cudf.read_csv``
-    -  Use standard Dask ingestion with Pandas, then convert to cuDF (For
-      Parquet and other formats this is often decently fast)
--  Linear operations
-    -  Element-wise operations:  ``df.x + df.y``, ``df ** 2``
-    -  Assignment: ``df['z'] = df.x + df.y``
-    -  Row-wise selections:  ``df[df.x > 0]``
-    -  Loc:  ``df.loc['2001-01-01': '2005-02-02']``
-    -  Date time/string accessors:  ``df.timestamp.dt.dayofweek``
-    -  ... and most similar operations in this category that are already implemented in cuDF
--  Reductions
-    -  Like ``sum``, ``mean``, ``max``, ``count``, and so on on ``Series`` objects
-    -  <strike>Support for reductions on full dataframes</strike>
-    -  <strike>``std``</strike>
-    -  Custom reductions with [dask.dataframe.reduction](http://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.Series.reduction)
--  Groupby aggregations
-    -  On single columns: ``df.groupby('x').y.max()``
-    -  With custom aggregations:
-    -  <strike>groupby standard deviation</strike>
-    -  <strike>grouping on multiple columns</strike>
-    -  <strike>groupby agg for multiple outputs</strike>
--  Joins:
-    -  On full unsorted columns: ``left.merge(right, on='id')`` (expensive)
-    -  On sorted indexes: ``left.merge(right, left_index=True, right_index=True)`` (fast)
-    -  On large and small dataframes: ``left.merge(cudf_df, on='id')`` (fast)
--  <strike>Rolling operations</strike>
--  Converting to and from other forms
-    -  Dask + Pandas to Dask + cuDF ``df.map_partitions(cudf.from_pandas)``
-    -  Dask + cuDF to Dask + Pandas ``df.map_partitions(lambda df: df.to_pandas())``
-    -  cuDF to Dask + cuDF: ``dask.dataframe.from_pandas(df, npartitions=20)``
-    -  Dask + cuDF to cuDF: ``df.compute()``
-
-Additionally all generic Dask operations, like ``compute``, ``persist``,
-``visualize`` and so on work regardless.
-
-
-Developing the API
-------------------
-
-Above we mention the following:
-
-> and most similar operations in this category that are already implemented in cuDF
-
-This is because it is difficult to create a comprehensive list of operations in
-the cuDF and Pandas libraries.  The API is large enough to be difficult to track
-effectively.  For any operation that operates row-wise like ``fillna`` or
-``query`` things will likely, but not certainly work.  If operations don't work
-it is often due to a slight inconsistency between Pandas and cuDF that is
-generally easy to fix.  We encourage users to look at the [cuDF issue
-tracker](https://github.com/rapidsai/cudf/issues) to see if their issue has
-already been reported and, if not,
-[raise a new issue](https://github.com/rapidsai/cudf/issues/new).
-
-
-Navigating the API
-------------------
-
-This project reuses the
-[Dask DataFrame](https://docs.dask.org/en/latest/dataframe.html) project, which
-was originally designed for Pandas, with the newer library cuDF.  Because we use
-the same Dask classes for both projects there are often methods that are
-implemented for Pandas, but not yet for cuDF.  As a result users looking at the
-full Dask DataFrame API can be misleading, and often lead to frustration when
-operations that are advertised in the Dask API do not work as expected with
-cuDF.  We apologize for this in advance.
diff --git a/docs/cudf/source/index.rst b/docs/cudf/source/index.rst
index 8a100487374..db4134800be 100644
--- a/docs/cudf/source/index.rst
+++ b/docs/cudf/source/index.rst
@@ -8,7 +8,6 @@ Welcome to cuDF's documentation!
    api.rst
    10min.ipynb
    basics.rst
-   dask-cudf.md
    10min-cudf-cupy.ipynb
    guide-to-udfs.ipynb
    internals.md
diff --git a/python/dask_cudf/README.md b/python/dask_cudf/README.md
new file mode 100644
index 00000000000..02f9ab0c000
--- /dev/null
+++ b/python/dask_cudf/README.md
@@ -0,0 +1,12 @@
+# Dask-cuDF
+
+A [cuDF](https://github.com/rapidsai/cudf)-backed [Dask-DataFrame](https://docs.dask.org/en/latest/dataframe.html) API for out-of-core and multi-GPU ETL.
+
+## Brief Introduction
+
+Dask is a task-based library for parallel scheduling and execution. In addition to its central scheduling machinery, the library also includes the [Dask-DataFrame](https://docs.dask.org/en/latest/dataframe.html) module, which is a scalable version of the [Pandas](https://pandas.pydata.org/) DataFrame/Series API.  Dask-cuDF builds upon Dask-DataFrame to provide a convenient API for the decomposition and processing of large cuDF DataFrame and/or Series objects.
+
+### Documentation Links
+
+- [10 Minutes to cuDF and Dask-cuDF](https://docs.rapids.ai/api/cudf/stable/10min.html)
+- [Dask-CUDA](https://github.com/rapidsai/dask-cuda) (for multi-GPU Scaling)
\ No newline at end of file

From e9cea4aad7508e4fc60cfbcb99803046d996bacc Mon Sep 17 00:00:00 2001
From: rjzamora <rzamora217@gmail.com>
Date: Tue, 3 Nov 2020 21:00:53 -0800
Subject: [PATCH 2/2] changelog

---
 CHANGELOG.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 45ffed754f3..5b10a10908d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -30,6 +30,7 @@
 - PR #6564 Load JNI library dependencies with a thread pool
 - PR #6573 Create `cudf::detail::byte_cast` for `cudf::byte_cast`
 - PR #6597 Use thread-local to track CUDA device in JNI
+- PR #6665 Update stale dask-cudf documentation 
 
 ## Bug Fixes