Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans for v3.4.20 release #14232

Closed
25 tasks done
ahrtr opened this issue Jul 19, 2022 · 34 comments
Closed
25 tasks done

Plans for v3.4.20 release #14232

ahrtr opened this issue Jul 19, 2022 · 34 comments

Comments

@ahrtr
Copy link
Member

ahrtr commented Jul 19, 2022

We intentionally skipped back-porting some bug fixes in 3.4.19 in order to minimize the impact. Since 3.4.19 is already released and validated to be working well, so we can consider to release 3.4.20 to backport some PRs.

Pipelines/test

There are still some flaky test failures from time to time. As previous release, any fix to stabilize the pipeline is welcome.

Issues/PRs that should be included in 3.4.20

Issues/PRs not considered in 3.4.20

Please feel free to chime in if you think any PR/issues need to be investigated or included in 3.4.20. Please also feel free to let us know if anyone has any concerns or comment. cc @ptabor @serathius @spzala @hexfusion @mitake @lavacat @endocrimes @dims @liggitt @neolit123

@chaochn47
Copy link
Member

Nice to backport: #12335

@ahrtr
Copy link
Member Author

ahrtr commented Jul 19, 2022

Nice to backport: #12335

Thanks for the feedback. I just had a quick review on the PR, and it's a minor change and should be safe to backport. Please feel free to deliver a PR for this.

@vivekpatani
Copy link
Contributor

@ahrtr #14241

@tangcong
Copy link
Contributor

@ahrtr #14169

@ahrtr
Copy link
Member Author

ahrtr commented Jul 20, 2022

@ahrtr #14169

Thanks for the feedback. I agree the PR can improve the performance/throughput, but it also added a new flag --max-concurrent-streams which usually we shouldn't backport. We need to get consensus on this one. Please kindly provide your opinion (thumb up or down?). thx.

Please read #14169 (comment) if you want to quickly understand the context of the PR.

Personally, I incline to backporting it to 3.4.

@tangcong
Copy link
Contributor

thanks for the feedback. it is actually a bug, pr #8238 only solves a part of the problem (etcd does not enable tls), we have encountered a lot of user feedback (users using APISIX, accessing etcd cluster users through HTTP 1.1x protocol). @ahrtr

@ahrtr
Copy link
Member Author

ahrtr commented Jul 20, 2022

I added 14169 into the list, please let me know if anyone have any concerns. @serathius @spzala @ptabor

@ahrtr
Copy link
Member Author

ahrtr commented Jul 20, 2022

I pinged the original author of PR 12846, but got no response so far.
Is anyone interested in backporting it to 3.4? thx

@vivekpatani
Copy link
Contributor

@ahrtr #14246 - have backported the requested PR. I maybe lacking understanding, but from the looks of it, this function isn't called anywhere (even in the original PR), am I doing this correctly? Thanks. @ptabor @spzala @serathius

vivekpatani added a commit to vivekpatani/etcd that referenced this issue Jul 20, 2022
@ahrtr
Copy link
Member Author

ahrtr commented Jul 20, 2022

@ahrtr #14246 - have backported the requested PR. I maybe lacking understanding, but from the looks of it, this function isn't called anywhere (even in the original PR), am I doing this correctly? Thanks. @ptabor @spzala @serathius

Thanks @vivekpatani for the PR. Are you talking about RemoveMatchFile? It's supposed to be only called on etcd startup.

vivekpatani added a commit to vivekpatani/etcd that referenced this issue Jul 20, 2022
vivekpatani added a commit to vivekpatani/etcd that referenced this issue Jul 20, 2022
@vivekpatani
Copy link
Contributor

@ahrtr I think I had some misunderstanding, the PR is now fixed/mergeable, please feel free to review when you have a minute, thank you!

@chaochn47
Copy link
Member

#13435? @ahrtr This PR is safe to backport in my opinion.

@ahrtr
Copy link
Member Author

ahrtr commented Jul 21, 2022

#13435? @ahrtr This PR is safe to backport in my opinion.

Agreed. Please feel free to deliver a PR, thx.

@mborsz
Copy link
Contributor

mborsz commented Jul 21, 2022

Can we also consider backporting PR #12871?
It seems be be quite safe to backport change and it will both reduce memory consumption and reduce work in latency sensitive applierV3backend.Apply.

@ahrtr
Copy link
Member Author

ahrtr commented Jul 21, 2022

Can we also consider backporting PR #12871? It seems be be quite safe to backport change and it will both reduce memory consumption and reduce work in latency sensitive applierV3backend.Apply.

It should have already been backported in 12888

@serathius
Copy link
Member

@ahrtr #12762 is fix for #12680, in the end the solution was done in one, but three PRs:

As all of them change how pending requests are handled during leader election I would prefer to merge all of them to avoid this logic to diverge between branches. Partial backport means that we could end up not really fixing it.

@ramses
Copy link

ramses commented Jul 21, 2022

Hi @ahrtr ,

I have the backport for #13435 ready. However, I currently do not have permission to push a branch to etcd-io/etcd.git.

@endocrimes
Copy link
Contributor

@ramses you'll need to make a "fork" on GitHub, then push to that, and open the PR across repos. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request

@ramses
Copy link

ramses commented Jul 21, 2022

Thanks @endocrimes , realized that immediately after :-)

The PR for #13435: #14254

ramses pushed a commit to ramses/etcd that referenced this issue Jul 21, 2022
This is a backport of etcd-io#13435 and is
part of the work for 3.4.20
etcd-io#14232.

The original change had a second commit that modifies a changelog file.
The 3.4 branch does not include any changelog file, so that part was not
cherry-picked.

Local Testing:

- `make build`
- `make test`

Both succeed.
@ahrtr
Copy link
Member Author

ahrtr commented Jul 21, 2022

@ahrtr #12762 is fix for #12680, in the end the solution was done in one, but three PRs:

* [raft: postpone MsgReadIndex until first commit in the term #12762](https://github.com/etcd-io/etcd/pull/12762)

* [etcdserver: resend ReadIndex request on empty apply request #12795](https://github.com/etcd-io/etcd/pull/12795)

* [Read index retry #12780](https://github.com/etcd-io/etcd/pull/12780)

As all of them change how pending requests are handled during leader election I would prefer to merge all of them to avoid this logic to diverge between branches. Partial backport means that we could end up not really fixing it.

Thanks @serathius for the reminder. I will take care of all of them myself later.

ramses pushed a commit to ramses/etcd that referenced this issue Jul 21, 2022
This is a backport of etcd-io#13435 and is
part of the work for 3.4.20
etcd-io#14232.

The original change had a second commit that modifies a changelog file.
The 3.4 branch does not include any changelog file, so that part was not
cherry-picked.

Local Testing:

- `make build`
- `make test`

Both succeed.

Signed-off-by: Ramsés Morales <[email protected]>
vivekpatani added a commit to vivekpatani/etcd that referenced this issue Jul 21, 2022
@ahrtr
Copy link
Member Author

ahrtr commented Jul 22, 2022

@ahrtr
Copy link
Member Author

ahrtr commented Jul 25, 2022

Although there are still some issues which need further investigation, we have already finished backporting all existing PRs.

@endocrimes Do you have bandwidth to run Jepsen test on release-3.4?

@endocrimes
Copy link
Contributor

@ahrtr sure 👍

@ahrtr
Copy link
Member Author

ahrtr commented Jul 28, 2022

I may release 3.4.20 sometime next week when all remaining tasks(see below) are done,

  1. The failures discovered in 3.4 pipeline are fixed. Only [3.4] panic: runtime error: invalid memory address or nil pointer dereference #14256 (@ramses @SimFG) is a blocker to me.
  2. All tests, including Jepsen test (@endocrimes ) and manual test in my local environment (@ahrtr ), are done. We need to think about how to enhance the release process in future, especially how to qualify our test process/utilities.

@serathius @spzala @ptabor @hexfusion Please kindly let me know if you have any comments or concerns.

@spzala
Copy link
Member

spzala commented Jul 28, 2022

@ahrtr thank you! No concerns, and sounds great.

@ahrtr
Copy link
Member Author

ahrtr commented Aug 1, 2022

@ahrtr sure 👍

@endocrimes any update on the Jepsen test on release-3.4?

@ahrtr
Copy link
Member Author

ahrtr commented Aug 2, 2022

#14256 isn't a blocker any more, please see my comment #14256 (comment)

@endocrimes
Copy link
Contributor

@ahrtr running them now :)

@vivekpatani
Copy link
Contributor

Hi @ahrtr @lavacat @serathius found one last one that may need to be back ported? #13824 Please let me know if that is the case. Thanks.

@ahrtr
Copy link
Member Author

ahrtr commented Aug 3, 2022

Hi @ahrtr @lavacat @serathius found one last one that may need to be back ported? #13824 Please let me know if that is the case. Thanks.

Thanks @vivekpatani .

etcd 3.5.4 (and main) is using DialContext, while etcd 3.4.19 (and release-3.4 latest code) is using Dial. So release-3.4 doesn't have the issue #13810.

I agree DialContext is better than Dial because Dial has already been deprecated. But Let's just keep it as it's, because we should only cherry pick bug fixes into 3.4.

@ahrtr
Copy link
Member Author

ahrtr commented Aug 5, 2022

Talked to @endocrimes , no any new issue found during the Jepsen test. Will release 3.4.20 soon.

@ahrtr
Copy link
Member Author

ahrtr commented Aug 5, 2022

3.4.20 is just released, https://github.com/etcd-io/etcd/releases/tag/v3.4.20 .

Thanks everyone in the CC list. It's really the result of great team work!

Thanks again !!!

@ahrtr ahrtr closed this as completed Aug 5, 2022
@vivekpatani
Copy link
Contributor

vivekpatani commented Aug 9, 2022

Thanks for doing this.

etcd 3.5.4 (and main) is using DialContext, while etcd 3.4.19...

etcd-3.5 latest release is not using DialContext or am I missing something?

@ahrtr

@ahrtr
Copy link
Member Author

ahrtr commented Aug 9, 2022

@vivekpatani The reason is the PR #13824 has been cherry picked to release-3.5 yet. Please feel free to deliver a PR for that. thx

In release-3.5, DialContext is used in the transport, but Dial is used in the ValidateSecureEndpoints.

In release-3.4, both places are using Dial.

tjungblu pushed a commit to tjungblu/etcd that referenced this issue Sep 8, 2022
tjungblu pushed a commit to tjungblu/etcd that referenced this issue Sep 8, 2022
This is a backport of etcd-io#13435 and is
part of the work for 3.4.20
etcd-io#14232.

The original change had a second commit that modifies a changelog file.
The 3.4 branch does not include any changelog file, so that part was not
cherry-picked.

Local Testing:

- `make build`
- `make test`

Both succeed.

Signed-off-by: Ramsés Morales <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

9 participants