Revise 10 minutes notebook. #10738

bdice · 2022-04-26T15:55:35Z

Follow-up from #10685 to fix deprecation warnings in the 10 minute notebook.

Fixes: #10613
Changes:

Fixed deprecation warning for Series.applymap ➡️ Series.apply
Removed two cells demonstrating Series.append. This has also been removed from the Pandas 10 minute notebook because the feature is deprecated.
Refactored ORC file path logic to be a bit simpler

review-notebook-app · 2022-04-26T15:55:39Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

docs/cudf/source/user_guide/10min.ipynb

codecov · 2022-04-26T17:03:18Z

Codecov Report

Merging #10738 (49f60ae) into branch-22.06 (e0d94f3) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@               Coverage Diff                @@
##           branch-22.06   #10738      +/-   ##
================================================
+ Coverage         86.28%   86.32%   +0.03%     
================================================
  Files               144      144              
  Lines             22654    22656       +2     
================================================
+ Hits              19548    19558      +10     
+ Misses             3106     3098       -8

Impacted Files	Coverage Δ
python/cudf/cudf/core/resample.py	`88.97% <ø> (ø)`
python/dask_cudf/dask_cudf/groupby.py	`97.42% <100.00%> (+0.01%)`	⬆️
python/dask_cudf/dask_cudf/tests/test_groupby.py	`100.00% <100.00%> (ø)`
python/cudf/cudf/core/dataframe.py	`93.78% <0.00%> (+0.04%)`	⬆️
python/cudf/cudf/core/column/string.py	`88.78% <0.00%> (+0.12%)`	⬆️
python/cudf/cudf/core/groupby/groupby.py	`91.79% <0.00%> (+0.22%)`	⬆️
python/dask_cudf/dask_cudf/core.py	`73.62% <0.00%> (+0.26%)`	⬆️
python/cudf/cudf/core/column/numerical.py	`96.17% <0.00%> (+0.29%)`	⬆️
python/cudf/cudf/core/tools/datetimes.py	`84.49% <0.00%> (+0.30%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ad1e51...49f60ae. Read the comment docs.

bdice · 2022-04-26T18:06:18Z

docs/cudf/source/user_guide/10min.ipynb

-      "+-----------------------------------------------------------------------------+\n"
+      "Tue Apr 26 10:47:09 2022       \r\n",
+      "+-----------------------------------------------------------------------------+\r\n",
+      "| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |\r\n",


@shwina I'm not sure if these nvidia-smi outputs are showing what we intend. It's called twice, before and after the dask dataframe is persisted. There's a comment before that indicates the memory usage should change. However, I don't see a difference in memory usage before and after the persist() call.

Because Dask is lazy, the computation has not yet occurred. We can see that there are twenty tasks in the task graph and we've used about 800 MB of memory. We can force computation by using persist. By forcing execution, the result is now explicitly in memory and our task graph only contains one task per partition (the baseline).

Is this a bug or a change in behavior? Should we change that notice and/or remove the nvidia-smi output entirely so that the notebook's results are less system-dependent?

The issue here is that persist() returns immediately. It takes a bit for the DataFrame to materialize. If you wait a bit after the call to persist() and before the second nvidia-smi, the increase in memory is obvious. Unfortunately, this doesn't lend very well to automated notebook execution -- maybe we insert a time.sleep() with an explanation?

Wow, that's a little surprising. Definitely wouldn't have considered that possibility. I will take a look at this tomorrow, and probably add a sleep command as you suggest. (Note that every sleep command will increase the time to build the docs, so I reduced the final sleep "wait" at the end of this notebook to be less than 60 seconds.)

Yeah I realize it's not ideal..

I finally got a sleep command added here. (Found #10829 in the process, which was a blocker.) The memory usage grows after the .persist() call.

docs/cudf/source/user_guide/10min.ipynb

bdice · 2022-05-11T18:29:20Z

This is waiting on bugfix #10829 / #10830, which affects this notebook.

bdice · 2022-05-13T03:46:39Z

@shwina @mmccarty This is ready for further review. Thanks! cc: @galipremsagar

mmccarty

Looks good. Thanks!

bdice · 2022-05-13T15:40:20Z

@gpucibot merge

Revise 10 minutes notebook.

ddda6d2

bdice added doc Documentation non-breaking Non-breaking change labels Apr 26, 2022

bdice self-assigned this Apr 26, 2022

bdice added the Python Affects Python cuDF API. label Apr 26, 2022

bdice requested review from shwina and mmccarty April 26, 2022 15:56

bdice commented Apr 26, 2022

View reviewed changes

docs/cudf/source/user_guide/10min.ipynb Outdated Show resolved Hide resolved

bdice commented Apr 26, 2022

View reviewed changes

mmccarty reviewed Apr 26, 2022

View reviewed changes

docs/cudf/source/user_guide/10min.ipynb Show resolved Hide resolved

This was referenced Apr 26, 2022

[BUG] Series.apply does not retain the Series name #10715

Closed

Migrated user guide notebooks to MyST-NB and added sphinx extension #10685

Merged

Remove dask client output.

59bdba5

github-actions bot removed the Python Affects Python cuDF API. label Apr 26, 2022

This was referenced May 10, 2022

[REVIEW] Remove append from 10min notebook #10824

Closed

[BUG] 10 Minutes to cuDF and dask-cuDF uses deprecated Append method #10613

Closed

Merge remote-tracking branch 'upstream/branch-22.06' into revise-10min

5ea71d0

bdice added the 0 - Blocked Cannot progress due to external reasons label May 11, 2022

bdice added 2 commits May 12, 2022 11:07

Merge remote-tracking branch 'upstream/branch-22.06' into revise-10min

eae1e0d

Sleep before second nvidia-smi call.

3297f3c

bdice removed the 0 - Blocked Cannot progress due to external reasons label May 13, 2022

Merge remote-tracking branch 'upstream/branch-22.06' into revise-10min

49f60ae

bdice requested a review from mmccarty May 13, 2022 03:45

mmccarty approved these changes May 13, 2022

View reviewed changes

galipremsagar approved these changes May 13, 2022

View reviewed changes

rapids-bot bot merged commit 0802451 into rapidsai:branch-22.06 May 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise 10 minutes notebook. #10738

Revise 10 minutes notebook. #10738

bdice commented Apr 26, 2022 •

edited by galipremsagar

Loading

review-notebook-app bot commented Apr 26, 2022

codecov bot commented Apr 26, 2022 •

edited

Loading

bdice Apr 26, 2022 •

edited

Loading

shwina Apr 26, 2022 •

edited

Loading

bdice Apr 26, 2022

shwina Apr 26, 2022

bdice May 13, 2022

bdice commented May 11, 2022

bdice commented May 13, 2022

mmccarty left a comment

bdice commented May 13, 2022

Revise 10 minutes notebook. #10738

Revise 10 minutes notebook. #10738

Conversation

bdice commented Apr 26, 2022 • edited by galipremsagar Loading

review-notebook-app bot commented Apr 26, 2022

codecov bot commented Apr 26, 2022 • edited Loading

Codecov Report

bdice Apr 26, 2022 • edited Loading

Choose a reason for hiding this comment

shwina Apr 26, 2022 • edited Loading

Choose a reason for hiding this comment

bdice Apr 26, 2022

Choose a reason for hiding this comment

shwina Apr 26, 2022

Choose a reason for hiding this comment

bdice May 13, 2022

Choose a reason for hiding this comment

bdice commented May 11, 2022

bdice commented May 13, 2022

mmccarty left a comment

Choose a reason for hiding this comment

bdice commented May 13, 2022

bdice commented Apr 26, 2022 •

edited by galipremsagar

Loading

codecov bot commented Apr 26, 2022 •

edited

Loading

bdice Apr 26, 2022 •

edited

Loading

shwina Apr 26, 2022 •

edited

Loading