Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize cuDF Python docs #10691

Merged
merged 38 commits into from
May 4, 2022
Merged

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Apr 20, 2022

This PR is composed of two high-level changes:

  • Replaces the use of ReStructuredText with MyST Markdown. I used rst2myst for this and it worked pretty well. The rationale for this change is simple: we use myst-nb to render notebooks into documentation, and for consistency, it's nice to use myst-parser to parse the rest of our docs too. As a matter of opinion, I think Markdown is simpler and more familiar to most developers.

  • Reorganizes the docs (see below):

Prior to this PR, the cuDF documentation was divided into 3 sections:

  • A user guide
  • A "Basics" section
  • API reference

The distinction between the first two sections was never clear. I've gone ahead and merged those into a single section named "User Guide". This is also more consistent with Pandas.

This PR also makes a couple of other changes:

  • Renamed the "Basics" page under the previous "Basics" section to "Data Types", as that reflects its contents more accurately. I also modified the content here a bit.
  • Renamed the "10 minutes to CuPy and CuDF" notebook to "Interoperability between CuPy and CuDF" as that more accurately describes what that page is about.

Compare the TOC from this PR (below) with our currently published docs.

Screen Shot 2022-04-20 at 1 13 04 PM

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added the conda label Apr 20, 2022
@codecov
Copy link

codecov bot commented Apr 20, 2022

Codecov Report

Merging #10691 (1857972) into branch-22.06 (84f88ce) will increase coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 1857972 differs from pull request most recent head e68b86e. Consider uploading reports for the commit e68b86e to get more accurate results

@@               Coverage Diff                @@
##           branch-22.06   #10691      +/-   ##
================================================
+ Coverage         86.40%   86.43%   +0.02%     
================================================
  Files               143      143              
  Lines             22444    22448       +4     
================================================
+ Hits              19393    19402       +9     
+ Misses             3051     3046       -5     
Impacted Files Coverage Δ
python/cudf/cudf/_fuzz_testing/utils.py 0.00% <ø> (ø)
python/cudf/cudf/comm/gpuarrow.py 79.76% <ø> (ø)
python/cudf/cudf/core/column/string.py 89.21% <ø> (+0.12%) ⬆️
python/cudf/cudf/core/frame.py 93.41% <ø> (ø)
python/cudf/cudf/core/series.py 95.16% <ø> (ø)
python/custreamz/custreamz/kafka.py 29.16% <ø> (ø)
python/cudf/cudf/core/column/interval.py 83.67% <100.00%> (ø)
python/cudf/cudf/core/column/struct.py 96.42% <100.00%> (ø)
python/cudf/cudf/core/dataframe.py 93.74% <100.00%> (+0.04%) ⬆️
python/cudf/cudf/core/reshape.py 90.11% <100.00%> (+0.11%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84f88ce...e68b86e. Read the comment docs.

@github-actions github-actions bot removed the conda label Apr 26, 2022
@shwina shwina added doc Documentation non-breaking Non-breaking change labels Apr 26, 2022
@shwina shwina marked this pull request as ready for review April 26, 2022 14:58
@@ -2,15 +2,16 @@
"cells": [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be an issue with the way ReviewNB diffs things. There's no ==================================== in the source. Ditto in many places below.


Reply via ReviewNB

@@ -2,46 +2,31 @@
"cells": [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the hardcoded table of contents below as Sphinx/myst-nb already generate one for us.


Reply via ReviewNB

@shwina shwina requested a review from galipremsagar April 26, 2022 15:06
Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great to me Ashwin. Except for a few rendering issues that I mentioned below.

docs/cudf/source/user_guide/data-types.md Show resolved Hide resolved
docs/cudf/source/user_guide/data-types.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/data-types.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/groupby.md Outdated Show resolved Hide resolved
.. table::
:class: io-supported-types-table special-table
:widths: 15 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

+-----------------------+--------+--------+--------+--------+---------+--------+--------+--------+--------+-------------------+--------+--------+---------+---------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how GitHub thinks you only moved this and 2 more files while you also made similar file movements in this PR 🤣

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah what the heck is that about...

Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM @shwina and all the issues are now fixed. Thank you!

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice job @shwina. I haven't had a chance to render the docs locally, but I thought I would take a peek. I left a handful of comments, but only one (on index.md) is probably blocking. Feel free to skip/ignore any comments or deal with them in future PRs. I recognize this PR is huge so I don't want to block it. Hope the feedback is helpful! :)

docs/cudf/source/user_guide/dask-cudf.md Outdated Show resolved Hide resolved
also provide special data types for working with decimals, list-like,
and dictionary-like data.

All data types in cuDF are [nullable](/user_guide/missing-data).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the MyST equivalent of Sphinx's :ref: rather than an absolute path. The absolute path won't work in rendered HTML unless it's being served at the root. That's prone to breakage. This piece of the documentation might help: https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html#targets-and-cross-referencing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used a relative path instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A relative path is an improvement but not quite as good as ref syntax. If the page is moved or renamed, the ref will still pick it up as long as it's in the same Sphinx project. refs with missing referents will throw warnings, and refs are supported by non-HTML Sphinx outputs. An absolute/relative path won't be checked for correctness afaik.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With markdown syntax [nullable](mossing-data), I get:

cudf/docs/cudf/source/user_guide/data-types.md:8: WARNING: 'myst' reference target not found: mossing-data

The markdown syntax translates into Sphinx's any role, which seems pretty flexible in what it can do.

Copy link
Contributor

@bdice bdice May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to know! If it's translated to the any role, that's awesome. I thought it would be interpreted as a link, rather than a reference (any will match ref or any other role).

(also "mossing data" 😆 )

## NumPy data types

We use NumPy data types for integer, floating, datetime, timedelta,
and string data types. Thus, just like in NumPy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it weird that this always uses two spaces between sentences? Personally I prefer one space between sentences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just Emacs'Emacs' auto-fill at work. Personally, I think Semantic Linefeeds are the best convention for documentation source, but it's hard to incorporate them in anything but personal projects :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vim’s “gq” does the same thing with two spaces. I would be open to rewriting this with semantic lines (or the easier to enforce “one sentence per line”). I have had no issues in practice with using that style on other projects. This word wrap is pretty harsh right now - it’s wrapping at less than 80 chars (74?), so I think it is unlikely that this word wrapped formatting will survive more than one or two revisions anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a discussion to have offline/as part of the dev guide?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not oppose making that change in this PR, since all the files are being rewritten anyway. If you don't want to make that decision in this PR, this would be a good topic for tomorrow's cuDF Python meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for semantic linefeeds

`np.dtype("float32")`, `np.float32`, and `"float32"` are all acceptable
ways to specify the `float32` data type:

```python
Copy link
Contributor

@bdice bdice May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered recently that code blocks using >>> are (depending on the renderer) supposed to be rendered as pycon: Python console, rather than python.

That applies to GitHub Flavored Markdown via Linguist and also content highlighted with Pygments. It appears that MyST uses Pygments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! I can confirm that it works, but are there any advantages to using pycon over python (either ways, I'd prefer to change this in a follow up)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, things like keywords in output/tracebacks aren't syntax highlighted by mistake. That's really cool!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! It was definitely a "TIL" thing for me too. I think a lot of projects just use python and don't know the difference.

docs/cudf/source/user_guide/data-types.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/internals.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/internals.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/internals.md Outdated Show resolved Hide resolved
docs/cudf/source/index.rst Show resolved Hide resolved
docs/cudf/source/user_guide/dask-cudf.md Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shwina for your dedication to this!

@shwina
Copy link
Contributor Author

shwina commented May 4, 2022

Rerun tests

@shwina
Copy link
Contributor Author

shwina commented May 4, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit dd68db3 into rapidsai:branch-22.06 May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants