Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a library_design.md file documenting the core Python data structures and their relationship #10817

Merged
merged 13 commits into from
May 23, 2022

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented May 10, 2022

This PR adds an library_design.md file discussing cuDF's internal architecture, including its core data structures, their purpose, and how they related to pandas and libcudf objects. The document is not short, but it aims to avoid being too long by focusing mainly on the layout between classes and how they interact. I do not discuss implementation details for specific functionality (e.g. the Merge or GroupBy classes), nor do I go into detail on the layout of files on disk. The emphasis is on understanding the different principal components and how they fit together.

This PR contributes to #6481. Subsequent PRs will focus on other aspects of a developer guide, such as a more information on how to contribute, write tests, benchmark, and write documentation.

@vyasr vyasr added doc Documentation Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 10, 2022
@vyasr vyasr added this to the CuDF Python Refactoring milestone May 10, 2022
@vyasr vyasr self-assigned this May 10, 2022
@vyasr vyasr added 3 - Ready for Review Ready for review by team and removed improvement Improvement / enhancement to an existing function labels May 10, 2022
@github-actions github-actions bot removed the Python Affects Python cuDF API. label May 10, 2022
@vyasr vyasr requested review from shwina and galipremsagar May 10, 2022 00:25
@codecov
Copy link

codecov bot commented May 10, 2022

Codecov Report

Merging #10817 (b613e00) into branch-22.06 (6352b4e) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@               Coverage Diff                @@
##           branch-22.06   #10817      +/-   ##
================================================
+ Coverage         86.29%   86.32%   +0.02%     
================================================
  Files               144      144              
  Lines             22656    22668      +12     
================================================
+ Hits              19552    19569      +17     
+ Misses             3104     3099       -5     
Impacted Files Coverage Δ
python/cudf/cudf/core/_base_index.py 85.71% <ø> (ø)
python/cudf/cudf/core/frame.py 93.41% <ø> (ø)
python/cudf/cudf/core/groupby/groupby.py 91.79% <ø> (+0.22%) ⬆️
python/cudf/cudf/core/index.py 92.06% <ø> (ø)
python/cudf/cudf/core/indexed_frame.py 91.70% <ø> (ø)
python/cudf/cudf/core/series.py 95.17% <ø> (ø)
python/cudf/cudf/core/column/string.py 88.78% <100.00%> (+0.12%) ⬆️
python/dask_cudf/dask_cudf/groupby.py 97.42% <100.00%> (+<0.01%) ⬆️
python/dask_cudf/dask_cudf/tests/test_groupby.py 100.00% <100.00%> (ø)
python/cudf/cudf/utils/ioutils.py 79.47% <0.00%> (-0.13%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac77940...b613e00. Read the comment docs.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice job @vyasr. I have a few suggestions attached. Let me know if you want another round of review, but I wouldn't spend more than 1 more round on this. (Our collective time is better spent on porting the rest of the proposed developer docs into Markdown than "perfecting" this document.)

docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
docs/cudf/source/index.rst Outdated Show resolved Hide resolved
@vyasr vyasr changed the title Feature/architecture docs Add an ARCHITECTURE.md file documenting the core Python data structures and their relationship May 12, 2022
@vyasr vyasr requested a review from bdice May 12, 2022 02:01
@vyasr
Copy link
Contributor Author

vyasr commented May 12, 2022

rerun tests

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work. I love the approach of decomposing the library into three layers and explaining each one! Have a few requests mostly around making the document more approachable to first-time/casual readers.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes you made. @shwina raises some good questions, but this is green-light from my side.

docs/cudf/source/developer_guide/ARCHITECTURE.md Outdated Show resolved Hide resolved
@vyasr vyasr requested a review from shwina May 18, 2022 22:00
@vyasr
Copy link
Contributor Author

vyasr commented May 18, 2022

@shwina this is ready for another round of review. Let me know if you want me to render and show you any part of the docs.

@vyasr vyasr changed the title Add an ARCHITECTURE.md file documenting the core Python data structures and their relationship Add a library_design.md file documenting the core Python data structures and their relationship May 19, 2022
@shwina
Copy link
Contributor

shwina commented May 20, 2022

This is looking good! I'm approving with a few small suggestions.

@shwina
Copy link
Contributor

shwina commented May 23, 2022

@gpucibot merge

@vyasr
Copy link
Contributor Author

vyasr commented May 23, 2022

rerun tests

@rapids-bot rapids-bot bot merged commit a9220bd into rapidsai:branch-22.06 May 23, 2022
@vyasr vyasr deleted the feature/architecture_docs branch June 14, 2022 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team doc Documentation non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants