Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2021.11.0 #197

Closed
jrbourbeau opened this issue Nov 2, 2021 · 16 comments
Closed

Release 2021.11.0 #197

jrbourbeau opened this issue Nov 2, 2021 · 16 comments

Comments

@jrbourbeau
Copy link
Member

jrbourbeau commented Nov 2, 2021

As part of our normal release cadence, if there are no known blockers, I'd like to release dask and distributed 2021.11.0.

Looking at some recent issues / PRs, it would be good to get a patch out which fixes dask/distributed#5472.

cc @jakirkham @jsignell @quasiben

EDIT: I forgot to mention there has been one reported regression from the 2021.10.0 release (xref dask/dask#8292) which it would also be good to get a patch out for

@jakirkham
Copy link
Member

Thanks James! 😀 Will mention internally

EDIT: I forgot to mention there has been one reported regression from the 2021.10.0 release (xref dask/dask#8292) which it would also be good to get a patch out for

Do we know how to fix this issue? Last I checked the cause wasn’t well understood. Has that changed?

@jrbourbeau
Copy link
Member Author

Do we know how to fix this issue? Last I checked the cause wasn’t well understood. Has that changed?

My guess is, as @gjoseph92 mentioned dask/dask#8174 (comment), there's some subtle issue in our high-level graph code, but to my knowledge nobody has been able to investigate yet. To be clear, I don't think this should block releasing, I was just being hopeful about a patch is all : )

Also @jcrist has a fix for dask/distributed#5472 over in dask/distributed#5488

Any issues on the RAPIDS side, or are we okay to release as usual tomorrow?

@jakirkham
Copy link
Member

Oops forgot to raise this 🤦‍♂️ Have mentioned it now. Will let you know if we’ve heard anything back by the morning (US Pacific)

@chrisroat
Copy link

What is the policy is on regressions? Is there any worry that a subtle bug is causing more issues than realized?

I currently put my time into trying to become a scheduler aficionado (since I spend my time killing deadlocked workers). I can also start learning high level graph if this particular regression is low priority, since for me it's a graph at the heart of our pipeline.

@gjoseph92
Copy link

We also have a new deadlock in distributed: dask/distributed#5480 (though it's almost certainly been around for a while already). Both myself and another user in the wild have triggered this through normal use. dask/distributed#5457 is a partial fix, but idk if we'll get it in by tomorrow? cc @fjetter

Since I don't think it's a recent regression (possibly worker state machine refactor), I don't know if we want to block this release for it.

@jrbourbeau
Copy link
Member Author

What is the policy is on regressions? Is there any worry that a subtle bug is causing more issues than realized?

That's a great question. We don't have a hard policy on regressions. Historically we've tried our best to fix regressions as they're reported or estimate how impactful the regression is based on user feedback (this is really hard to do). For this case, one option would be to just revert dask/dask#8174 until we're able to get to the bottom of the graph validation issue you raised (xref dask/dask#8292).

@gjoseph92
Copy link

FYI, on the topic of regressions... microsoft/LightGBM#4771

@jsignell
Copy link
Member

jsignell commented Nov 5, 2021

I just ran into a regression with reading from parquet dask/dask#8349

@jakirkham
Copy link
Member

Will let you know if we’ve heard anything back by the morning (US Pacific)

The only thing I've heard about is PR ( dask/distributed#5380 ), which is now in. So no blockers from us.

@jrbourbeau
Copy link
Member Author

jrbourbeau commented Nov 5, 2021

Thanks @gjoseph92 @jsignell for surfacing those regressions

I was in an unrelated meeting with @fjetter and @gjoseph92 where this release came up and I wanted to surface the result of that conversation. It seems like there are a few known regressions from the 2021.10.0 release. The deadlock issue is, in part, related to the large worker state refactor (there's a partial fix for this in the works, but it won't be ready for today). The parquet regression is certainly valid though, as @jsignell points out dask/dask#8349 (comment), is somewhat of an edge case.

Looking at the commits to dask and distributed since the last release, nothing stands out as particular controversial (there's definitely nothing like the big worker state machine refactor in the last release) but there is a fix for dask/distributed#5472, which several users reported running into and would be good to fix. All together my sense is that we should still release today. We won't be any worse off from a deadlock perspective than what's already released, and we'll fix a high-ish profile issue (dask/distributed#5472). We should still invest in fixing dask/dask#8349 and dask/dask#8292, but again I don't think we'll be any worse off by releasing today.

Thoughts?

@jrbourbeau
Copy link
Member Author

Planning to carry on with releasing in a bit if no further comments

@jakirkham
Copy link
Member

SGTM. Thanks for meeting with people and surfacing that info, James 😄

@rjzamora
Copy link
Member

rjzamora commented Nov 5, 2021

I just ran into a regression with reading from parquet dask/dask#8349

dask#8351 should close this

@jakirkham
Copy link
Member

Rick's PR is now in

@jakirkham
Copy link
Member

FYI, on the topic of regressions... microsoft/LightGBM#4771

This traces back to issue ( dask/distributed#5497 ). Thanks for tracking that down Gabe 😄

@jrbourbeau
Copy link
Member Author

Rick's PR is now in

Sorry for the delay, I was in meetings. Will start pushing out the release now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants