Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider xDS API lifecycle clock #10852

Closed
htuch opened this issue Apr 20, 2020 · 8 comments · Fixed by #10958
Closed

Reconsider xDS API lifecycle clock #10852

htuch opened this issue Apr 20, 2020 · 8 comments · Fixed by #10958
Labels
api/v4 Major version release @ end of Q3 2020 design proposal Needs design doc/proposal before implementation no stalebot Disables stalebot from closing an issue

Comments

@htuch
Copy link
Member

htuch commented Apr 20, 2020

The move to the v3 APIs has exposed some pain points that control plane authors are facing. It's not cheap to switch major version, in particular when lacking some of the generic API migration tooling that we have developed inside of Envoy.

The existing API lifecycle is described at https://github.com/envoyproxy/envoy/blob/master/api/API_VERSIONING.md. We cut a new major version every year, turn down an old one each year and any major version lives at most 2 years.

This means that any technical debt in API and associated code can live up to 2 years. If we change the clock cycle to every 2 years for a new major version, we might end up having technical debt live 3 or 4 years. This seems pretty significant for a fast moving project like Envoy.

@mattklein123 has proposed that we postpone cutting a new version until we have enough technical debt built up. Arguably we could keep the existing lifecycle with that approach, but instead of cutting a new API exactly every year, we have >= 1 year between major versions and have Envoy maintainers vote on whether to cut a new major version at every quarter after the year mark. This gives us both flexibility in cutting new major versions, potentially longer periods before inflicting cost on control plane authors, stability guarantees to other xDS clients, a predictable deprecation cycle, while leaving technical debt management under the control of maintainers.

This issue can track discussion, will add this to the coming community call agenda.

CC @mattklein123 @envoyproxy/api-shepherds @alyssawilk @markdroth @dfawley

@htuch htuch added api/v4 Major version release @ end of Q3 2020 design proposal Needs design doc/proposal before implementation labels Apr 20, 2020
@mattklein123 mattklein123 added the no stalebot Disables stalebot from closing an issue label Apr 20, 2020
@dfawley
Copy link
Member

dfawley commented Apr 20, 2020

+1 to the sentiment. Major version bumps in APIs are costly for the entire ecosystem. If there is a capability to use "experimental" tags to allow new features to be exempted from backward compatibility guarantees, then you should almost never need to release a new major version.

You call Envoy "fast moving", yet it has been GA for 3.5 years and is used by many major companies for critical services - at what point will it be considered "mature", and plan for no major version bumps?

@markdroth
Copy link
Contributor

I'm also in favor of not imposing the overhead of a major version bump without compelling justification. It's become clear that this change is extremely expensive.

I agree with @dfawley that in principle, we should never need to bump the major version number to add new features, because new features can instead be triggered by client capabilities. So it seems like the only reason we would ever have to bump the major version would be to eliminate deprecated fields, so that management servers can stop supporting old fields. And for that, we could just wait until enough deprecated fields have built up that it makes sense to get rid of a whole bunch of them at a time.

@ejona86 may also have thoughts here.

@mattklein123
Copy link
Member

I'm also in favor of not imposing the overhead of a major version bump without compelling justification. It's become clear that this change is extremely expensive.

Agreed, though I think at this point we should agree as a community to push through with v2 -> v3 since we don't know what we don't know until we do it once, and arguably the longer we wait to try this the more painful it will get.

And for that, we could just wait until enough deprecated fields have built up that it makes sense to get rid of a whole bunch of them at a time.

+1 this is my thinking as well.

@markdroth
Copy link
Contributor

I'm not convinced that it actually makes sense to go through with the v2 -> v3 migration right now. I agree that we would learn something from the exercise, but until we actually need to bump the version for some reason, it seems like this is an awful lot of work just to gain theoretical knowledge. It seems better to wait until we actually need to make the change, because then it will be much easier to devote the resources to make it happen.

@mattklein123
Copy link
Member

I'm not convinced that it actually makes sense to go through with the v2 -> v3 migration right now.

If this is an opinion held by many I think we need to urgently discuss this as a lot of plans would need to change and be communicated.

@mattklein123
Copy link
Member

mattklein123 commented Apr 21, 2020

(Also note that if we abandon the v3 force upgrade we need to actually go back and backfill recent API changes that have been v3 only, so again this needs urgent attention if there is going to be a change here in the POR)

@htuch
Copy link
Member Author

htuch commented Apr 21, 2020

@markdroth are you talking about for gRPC or Envoy? I think it wold be a major disruption to back out from v3 now in Envoy, since:

  1. The ecosystem is in the process of moving to v3.
  2. Envoy has internally migrated to v3, any v2 config needs to be converted first to v3.
  3. We risk losing credibility when stating plans around APIs.
  4. We abandon the v3 migration, process and tooling put in place for major version bumps. We have to start this all again at some point in the future once these have rotted and the community has lost the ability to deal with major version changes.

This should be tempered with the real cost of the v3 migration, but I think this is the wrong point in an API major version shift to be debating this. Ideally this happens at the start.

@markdroth
Copy link
Contributor

markdroth commented Apr 21, 2020

I agree that there are a lot of implications here and that we need to resolve this quickly, and I'm not dead-set against continuing the migration to v3 if that's still the right course of action. We'll have some offline discussions and try to come to consensus on this.

htuch added a commit to htuch/envoy that referenced this issue Apr 27, 2020
After extended discussion in envoyproxy#10852, Slack and offline, this patch proposes a revision to the API
major versioning policy where we will:

* Not mechanically cut a new major version at EOY, instead wait for enough tech debt.

* Encourage the use of client feature capabilities as an alternative to manage client
  feature support.

Fixes envoyproxy#10852.

Signed-off-by: Harvey Tuch <[email protected]>
htuch added a commit that referenced this issue May 1, 2020
After extended discussion in #10852, Slack and offline, this patch proposes a revision to the API
major versioning policy where we will:

* Not mechanically cut a new major version at EOY, instead wait for enough tech debt.

* Point to future minor versioning and client capabilities to help deal with tech debt.

Fixes #10852.

Signed-off-by: Harvey Tuch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api/v4 Major version release @ end of Q3 2020 design proposal Needs design doc/proposal before implementation no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants