Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Summer of Code 2021 Project #107

Merged
merged 8 commits into from
Aug 23, 2021
Merged

Conversation

freyam
Copy link
Contributor

@freyam freyam commented Aug 10, 2021

I will be writing about my work in the summer working along with @GenevieveBuckley and @martindurant on the different representations of Dask computation.

This has been part of the annual Google Summer of Code program where students get the opportunity to work with mentors on large-scale projects.

The blogpost would contain a list of all the merged work and what the new features mean to the users 🚀

@GenevieveBuckley
Copy link
Collaborator

GenevieveBuckley commented Aug 10, 2021

From our slack discussion...

Here's the structure I think we should have for the dask-blog post

Section 1: Visualizing high level graphs

Section 2: HTML representation
(maybe link to previous blogposts/twitter threads that talk about HTML reprs in Dask)

All of these (except the bugfix) will need nice before and after screenshots. Putting those together would be a fantastic start (feel free to make a new folder inside the dask-blog/images directory so they're all grouped in one place)

@freyam freyam marked this pull request as ready for review August 15, 2021 11:32
Co-authored-by: Genevieve Buckley <[email protected]>
@GenevieveBuckley
Copy link
Collaborator

We talked earlier about the difference in audience/purpose between your Medium blogpost and this one.

  1. Medium blogpost = here's all the stuff I worked on
  2. Dask blogpost = here's an overview of some new features for Dask users

This draft is very like (1) instead of (2), with a lot of first person sentences ("I worked...", "I changed...", "I tweaked..."). We'll probably want to adjust it to suit the second audience better.

@freyam
Copy link
Contributor Author

freyam commented Aug 16, 2021

I agree. I will edit accordingly.

@GenevieveBuckley
Copy link
Collaborator

General suggestions:

  • Shorter headings (shift the PR links/titles into the text below)
  • More descriptive alt-text for images. Ideally they should be a full sentence that makes sense without other supporting information (i.e. you can't expect a reader to know the topic of "PR 5178" means, we need to say that)

BTW, I'm happy to write or re-write text content, and will probably do some of this before we publish the final piece.

@freyam
Copy link
Contributor Author

freyam commented Aug 19, 2021

Hiii Genevieve, I read the draft you just pushed to the branch. It's amazing 💯!

I also had made some adjustments and tweaks of my own yesterday. It's not much, but looking at yours, it feels very non-professional.

I think we should go along with yours. I already have the images and some extra text ready. Will commit them in sometime when I reach to my laptop 😀

@GenevieveBuckley
Copy link
Collaborator

@freyam - there are still some important to-do items listed here, mostly involving adding the rest of the demonstration examples.

@jacobtomlinson - you might like to take a brief look over some of this (most relevant to your interests is the second section on HTML representations). No worries if you're busy though.

@freyam
Copy link
Contributor Author

freyam commented Aug 19, 2021

Updated ✔️

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small thoughts.

_posts/2021-08-23-gsoc-2021-project.md Outdated Show resolved Hide resolved
_posts/2021-08-23-gsoc-2021-project.md Show resolved Hide resolved
_posts/2021-08-23-gsoc-2021-project.md Show resolved Hide resolved
_posts/2021-08-23-gsoc-2021-project.md Outdated Show resolved Hide resolved
_posts/2021-08-23-gsoc-2021-project.md Outdated Show resolved Hide resolved
_posts/2021-08-23-gsoc-2021-project.md Outdated Show resolved Hide resolved
_posts/2021-08-23-gsoc-2021-project.md Outdated Show resolved Hide resolved
Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me

@freyam freyam changed the title [WIP] Google Summer of Code 2021 Project Google Summer of Code 2021 Project Aug 20, 2021
- Dataframe shuffles are particularly expensive operations. You can [read more about this here](https://docs.dask.org/en/latest/dataframe-best-practices.html#avoid-full-data-shuffling).
- Reading and writing data to/from storage/network services is often high-latency and therefore a bottleneck.
- Blockwise layers are generally efficient for computation.
- All layers are materialized during computation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should write more about materialized layers here. I can't think of a good way to say:

  • ideally we won't see many materialized layers before compute() is called
  • but we might see some and that's ok
  • but you might also accidentally materialize layers without meaning to, perhaps by counting the number of tasks or looking at the HTML repr (which in turn counts the number of tasks)
  • and fixing that is a job for dask developers, not dask users

I think on balance this might be more confusing than helpful. If anyone has ideas or thoughts around this I'd be interested to hear them.

@GenevieveBuckley
Copy link
Collaborator

Thank you @martindurant and @jacobtomlinson
I plan to merge this next week (most likely Australian Tuesday / US Monday)

If either of you have thoughts about this point https://github.com/dask/dask-blog/pull/107/files#r692726228 before then, let me know.

@GenevieveBuckley GenevieveBuckley merged commit f4e0375 into dask:gh-pages Aug 23, 2021
@freyam freyam deleted the gsoc branch August 24, 2021 04:30
@freyam
Copy link
Contributor Author

freyam commented Aug 24, 2021

💛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants