Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-1619: Add API design for session persistence #2159

Merged
merged 10 commits into from
Sep 25, 2023

Conversation

gcs278
Copy link
Contributor

@gcs278 gcs278 commented Jun 30, 2023

Adds specific details for API design as well as the documentation that support the decision of the session persistence API design.

What type of PR is this?
/kind gep

What this PR does / why we need it:

Which issue(s) this PR fixes:

N/A

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 30, 2023
@k8s-ci-robot k8s-ci-robot requested a review from howardjohn June 30, 2023 15:43
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 30, 2023
@k8s-ci-robot k8s-ci-robot requested a review from robscott June 30, 2023 15:43
@k8s-ci-robot
Copy link
Contributor

Hi @gcs278. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278 gcs278 force-pushed the gep1619-API-Design branch 5 times, most recently from 93b37e7 to b876c99 Compare July 3, 2023 19:34
@gcs278 gcs278 marked this pull request as ready for review July 3, 2023 19:35
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2023
@gcs278
Copy link
Contributor Author

gcs278 commented Jul 3, 2023

Ready for review if you happened to be interested in helping out again. This time, I have API design, and alternative API designs. Thanks
CC: @costinm @sanjaypujare

geps/gep-1619.md Outdated Show resolved Hide resolved
geps/gep-1619.md Outdated Show resolved Hide resolved
@sanjaypujare
Copy link
Contributor

sanjaypujare commented Jul 4, 2023

Ready for review if you happened to be interested in helping out again. This time, I have API design, and alternative API designs. Thanks CC: @costinm @sanjaypujare

Thanks. What do you think about expanding the scope to include meshes with GAMMA ?

Also CC @gnossen

geps/gep-1619.md Outdated

Only the Cookie Name and Cookie Value are required. All Cookie Attributes are optional. Attributes are simply key=value pairs with the value as optional for some attributes.
Session persistence augments a Kubernetes service object after the routing to the service has taken place. In
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a common use-case but I would like to point out additional use-cases such as traffic splitting, canary deployment (https://istio.io/latest/docs/tasks/traffic-management/traffic-shifting/) where session persistence should still work. So the cookie should force a request to go to a particular Kubernetes Service disregarding the traffic splitting/shifting rules just like we expect the request to go to a particular backend disregarding load balancing within that Service. This is one of the use-cases we are interested in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address my point, we should say that Session persistence may also augment an HTTPRoute (or GRPCRoute) where we indicate service stickyness based on the session cookie.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point, I missed this use case. I agree it makes sense to have HTTPRoute and GRCPRoute be targets given this use case.

Now, I question whether there is still a good reason for Services to be a valid target. Seems like attaching to a HTTPRoute or Service should provide the same session persistence functionality, right? i.e. attaching to a Service should still have the HTTPRoute stick a session to the same service regardless of traffic shifting configuration?

Seems the only benefit here for Service, is, the practical matter that you can easily "reuse" the configuration among multiple HTTPRoutes.

If you can attach to an xRoute, I feel like it might confuse users why they should attach to a Service over a xRoute or visa-verse.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are other comments indicating 'reuse' is not the main reason for setting sessions on Service.

If we want to avoid confusing users - we should probably only attach to service and defer per route. But I don't think it's a huge confusion - some servers will support both, others will only support service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, are we saying attaching to a Service vs. attaching to a HTTPRoute provides different functionality? i.e. would attaching to a Service still have the HTTPRoute stick a session to the same service regardless of traffic shifting configuration? Or would you have to attach to a HTTPRoute for persistence over traffic splitting?

geps/gep-1619.md Outdated
Session persistence augments a Kubernetes service object after the routing to the service has taken place. In
Kubernetes, a service is an abstract representation of a set of pods that work together to provide a specific
functionality. Services perform load balancing, routing traffic from a service endpoint to a pod. Session persistence
augments the load balancing functionality of the service by consistently directing traffic to the same pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to the same pod" and to the same service containing that pod. Just to reiterate my point in https://github.com/kubernetes-sigs/gateway-api/pull/2159/files#r1252413412

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I updated this section to reflect that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure this belong to the spec or is a requirement. For implementations that encode the IP - it's a must because of IP reuse. For implementations that encode a durable pod ID - this may be sufficient.

There are users who dynamically change label selector on a service, so a Pod may start selected by canary-service and then be selected by prod-service (without restarting the pod). This model is not compatible with current istio implementation of persistent sessions for example, but it's valid and used by some CI/CD tools.

I would leave this to implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costinm what is "this" in the context of you comment? Are you referring to Sanjay's statement on clarifying "to the same pod" and to the same service containing that pod.?

Are you saying we shouldn't specify how traffic splitting affects session persistence and leave that up to implementations?

geps/gep-1619.md Outdated Show resolved Hide resolved
geps/gep-1619.md Outdated

Taking Istio as an example, it offers a clear attachment point of session persistence to a service through a [DestinationRule](https://istio.io/latest/docs/reference/config/networking/destination-rule/#DestinationRule).
`DestinationRule` defines policies that apply to traffic intended for a service after routing has occurred. Within
`DestinationRule`, Istio couples the selection of load balancing algorithms with session persistence via their [LoadBalancerSettings](https://istio.io/latest/docs/reference/config/networking/destination-rule/#LoadBalancerSettings).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to take into account traffic shifting/splitting (as mentioned in my previous comment), then the Istio object is VirtualService or HttpRoute in the new Gateway API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, but here is talking quite literally. Istio's API attaches/augments a "service". Istio doesn't attach a destination rule configuration to a Virtual Service.

I think there is a difference here is in the literal vs. actual. Istio doesn't literally (i.e. API definitions) attach session persistence configuration to a Virtual Service, but I think your point is that, it actually (effectively, in practice) augments a Virtual Service in the use case of traffic splitting (note: i'm assuming that's what happens in practice, I've never tested this in Istio).

I think the argument for Gateway API design, is that we should literally attach our Session Persistence API to the objects it will actually augment. I.e. attach SessionPersistencePolicy to HTTPRoute.

This sounds better in my head and probably doesn't make much sense written out, but let me know if you can help me make sense of it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API semantics of attaching to route are just too complicated to understand and implement ( i.e. how different routes with or without persistence interact with each other ), and for a user who deploys a backend - persistence is a property of the implementation, not of how it is routed by different gateways ( a backend may be used in multiple gateways and in mesh ).

Usability and semantics are simply not practical (beyond the trivial case where you have a single route).

And I believe we have looked at all the existing systems - with only envoy doing things at route level - so for a common API I think we MUST support the higher granularity, with per-route optional for servers that can support it. I think that's consistent with the gateway extensibility model - many extensions can attach to different points.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costinm I take it that you are okay with attaching to xRoutes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpret @costinm comment to mean: attaching to Service (higher granularity) is at core support level, but attaching to a route is extended support level, but we should support both in the API. Let me know if I misunderstood.

geps/gep-1619.md Outdated Show resolved Hide resolved
geps/gep-1619.md Outdated Show resolved Hide resolved
geps/gep-1619.md Outdated Show resolved Hide resolved
Copy link

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for leading this.
I've left some minor comments.

geps/gep-1619.md Outdated Show resolved Hide resolved
geps/gep-1619.md Outdated Show resolved Hide resolved
Note right of C: [cookie] indicates a request<br>or response with a cookie
C->>+G: Request Web Page<br>[cookie]
G->>G: Consistent lookup<br>of server using cookie value
G->>+B: Request<br>[cookie]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear if the gateway needs to "rewrite" the cookie back to something that the backend understands, otherwise the backend should not use the contents of the cookie for any internal use to identify the session.
IMHO allowing the modification of a cookie creates a high coupling between the backend and gateway implementations, and I'm not sure this is advisable in the general case.

I think that using an additional cookie is the safest way around this issue, but if modification of a prefix is allowed, then the backend should be aware of this (consider reordering the approaches above, or add a note clarifying that cookie modification/prefix should be made aware to the backend app).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear if the gateway needs to "rewrite" the cookie back to something that the backend understands, otherwise the backend should not use the contents of the cookie for any internal use to identify the session.
IMHO allowing the modification of a cookie creates a high coupling between the backend and gateway implementations, and I'm not sure this is advisable in the general case.

These are good points. As a disclaimer, the current state of this GEP is that it's not attempting to prescribe how implementations should handle backend initiated sessions as stated in the "Session Initiation Guidelines" section: We leave the decision of handling existing sessions with each specific implementation.. This section wasn't intending to prescribe anything, but to explore potential solutions, though I understand exploring examples can be suggestive in itself.

Given that, I reorganized the list, added a statement that there are nuances to inserting/rewriting, and mentioned that inserting a cookie is a safe option, while still allowing implementation to use their discretion. Let me know if this addresses your concerns.

geps/gep-1619.md Outdated Show resolved Hide resolved
Copy link
Contributor Author

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @adisuissa

geps/gep-1619.md Outdated Show resolved Hide resolved
geps/gep-1619.md Outdated Show resolved Hide resolved
geps/gep-1619.md Outdated Show resolved Hide resolved
Note right of C: [cookie] indicates a request<br>or response with a cookie
C->>+G: Request Web Page<br>[cookie]
G->>G: Consistent lookup<br>of server using cookie value
G->>+B: Request<br>[cookie]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear if the gateway needs to "rewrite" the cookie back to something that the backend understands, otherwise the backend should not use the contents of the cookie for any internal use to identify the session.
IMHO allowing the modification of a cookie creates a high coupling between the backend and gateway implementations, and I'm not sure this is advisable in the general case.

These are good points. As a disclaimer, the current state of this GEP is that it's not attempting to prescribe how implementations should handle backend initiated sessions as stated in the "Session Initiation Guidelines" section: We leave the decision of handling existing sessions with each specific implementation.. This section wasn't intending to prescribe anything, but to explore potential solutions, though I understand exploring examples can be suggestive in itself.

Given that, I reorganized the list, added a statement that there are nuances to inserting/rewriting, and mentioned that inserting a cookie is a safe option, while still allowing implementation to use their discretion. Let me know if this addresses your concerns.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2023
Adds specific details for API design as well as the documentation that
support the decision of the session persistence API design.
- Updated TLDR to better reflect the current state of the GEP
- Updated Goals and Non-Goals to reflect recent additions to the GEP
- Removed the statement regarding client initiation of sessions
- Reworked the Session Initiation section for greater clarity,
  emphasizing that the examples provided are for illustrative purposes
- Eliminated the basic diagram illustrating the gateway's
  non-interference with backend session initiation
- Revised the wording to highlight that backend-initiated sessions
  can occur through methods beyond cookie rewriting
- Modified the Backend Session Initiation Diagram to indicate the
  potential use of multiple cookies
- Added an example illustrating global load balancer session initiation
- Moved the "When Does an Application Require Session Persistence"
  section closer to the session persistence topics
- Removed the redundant section discussing core vs. extended support
  levels in "API Attachment Point: Route vs. Service"
- Updated the "Cookie Rewriting" section to "Session Initiation Guidelines"
  and made the content less specific to cookie rewriting.
- Changed title to just "Session Persistence" to reflect the fact this
  isn't a session affinity GEP
- Updated TLDR with more details on session affinity not being
  addressed
- Added non-goal of session affinity API not being defined
- Cleaned up session affinity sections to make more brief since focus
  should be on session persistence
- Removed session affinity implementations table. Seemed more
  appropriate for another GEP
- Removed Open Questions regarding session affinity
- Added TODO question on session draining timeout
- Add non-goal of backend initiated sessions are not supported
- Add note of backend initiated sessions are not supported
- Adjust global load balancer example scenario based on comments
- Add statement about session persistence might break applications
- Add note about java inventing session IDs encoded in url parameters
- Make TargetRef of API less strict on what it can target
- SessionPersistenceType optional and extended
- Specified that clients can not expect specific cookie name
- Specified that service mesh might not use security-by-default cookie settings
- Fixed what-is-GAMMA statement
- Defined interaction between K8S Session Affinity and Session
  Persistence
- Add details about supporting multiple backends in traffic splitting
- Clarify "path" interpretation logic
- Add a open question about targeting the route matches section
- Remove SessionPersistencePolicyType as it was pointless as we don't
  have more than 1 types
Review updates from our Aug 29, 2023 meeting.

- Tweaked the TLDR to remove session affinity
- Updated `TargetRef` godoc to mention at least 1 target of service,
  HTTPRoute, or GRPCRoute must be supported to be in compliance
- Changed `DurationSeconds` to `AbsoluteTimeoutSeconds` due to my
  concern of duration being ambigious
- Added `IdleTimeoutSeconds` to configure idle timeout. Reading over
  comments, it seemed like this was more important than
  `AbsoluteTimeoutSeconds`.
- Added `SessionName` as a optional and extended field per meeting
  discussion.
- Updated `API Attachment Points` to talk about supporting 1 attachment
  point to be compliant and added paragraph on anticipating route
  section matching
- Updated `Cookie Attribute` section with name, and TTL and how they
  relate to the new fields and updated the path section to be more clear
  on idea of unique session between multiple routes pointing to the same
  service.
- Moved route section targeting from open question to TODO since we
  discussed and all agree we want to eventually support it.
- Update from matching route "matches" section to route "rule" section
  as the "rule" section has the matches and the backends
- Changed anticipated support for route rule section into full support
  since GEP-713 was updated.
- Various typos.
- Added sentence on implementations handling persistence on requests
  from the same browser page.
- Various typos
- Backend-initiated session rework: change order of approach list,
  mentioned that there are nuances to the approaches such as making it
  transparent, fixed a broken link, and mentioned that inserting a
  cookie is generally a safe approach for implementations.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 23, 2023
@gcs278
Copy link
Contributor Author

gcs278 commented Sep 25, 2023

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 25, 2023
@gcs278
Copy link
Contributor Author

gcs278 commented Sep 25, 2023

@sanjaypujare @costinm @shaneutt @adisuissa
Sounds like we are ready for a LGTM? Any last comments? I think @shaneutt is my GEP buddy here, mind providing LGTM given everyone's comments are addressed?

Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work on this @gcs278 (and everyone who participated). In particular, thanks for meticulously handling all the comments. As this remains provisional it seems entirely reasonable to merge this as such for now. Anyone who has any additional changes in mind, please note that this one PR has already gotten too big please consider opening a PR of your own after to make the changes you're interested in as that will enable us to focus more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: costinm, gcs278, sanjaypujare, shaneutt, va7ish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 25, 2023
@shaneutt shaneutt added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 25, 2023
@k8s-ci-robot k8s-ci-robot merged commit c7b6b57 into kubernetes-sigs:main Sep 25, 2023
@youngnick
Copy link
Contributor

Updates #1619

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.