Revisit networking / auth #254

deliahu · 2019-07-23T21:18:03Z

Context

Access control / authorization / authentication
- operator and APIs may behave differently, or each support multiple options but separately configurable (e.g. user could choose HTTP Basic Auth for operator and API Token for APIs)
- what does eksctl provide in terms of private security groups?
- Add simple configuration to control network access to Cortex cluster (e.g. open to the world, open to specific IPs / IAMs, ...)?
- in theory, auth could be implemented in workload (e.g. how IAM check in operator is done now), Istio, or API Gateway. Drop Istio? Can traffic splitting be done in API gateway or NLB? Consider Ambassador, Kong, Gloo?
- If considering API Gateway, research its price (may be expensive). AWS launched a stripped down version at ReInvent 2019, which may have the features we need, and is cheaper.
- Also consider how the CLI can verify that it's connected to the intended operator (to prevent operator impersonation). We should be able to turn off "SSL no verify".
- To make the ELB private, it might be better to use NLB instead of ELB if possible, and use VPC Link to connect to it from the API Gateway. Internal ELBs are also possible, but API Gateway may not be able to connect to them?
- While we're at it, it's worth checking if we need the ELB at all (especially if we use API gateway), and if so should we consider NLB or ALB instead?
- Do we want a separate ELB/ALB/NLBs for operator and APIs?
HTTPS
- https out-of-the box using someone else's certs (e.g. API Gateway)
  - API gateway would make non-user-provided HTTPS easy, since it uses AWS's certs, and would make it easy to add IAM or api key auth, but would reduce cloud-agnosticism.
- https using user-provided certs
Private cluster
- For security, all nodegroups should be private, and the NLBs should also be in the private subnet in the VPC. Here is the eksctl docs
  - In eks.yaml, use privateNetworking: true in the node group configs
- Should we make it public vs private configurable? If so, which would be default?
NAT Gateway
- Necessary for private subnet, but we could add a config to use the public subnet, and then the NAT gateway wouldn't be necessary (nat-gateways.yaml, Option to disable NAT gateway eksctl-io/eksctl#694)
- We could also add a config field to use the highly available NAT gateway (one per AZ) (support multiple NAT gateway configurations eksctl-io/eksctl#861, NAT only provisioned in one AZ, even for multi-AZ node groups eksctl-io/eksctl#392)
Queuing
- Ideally the queue should be "fair", in the sense that requests that come first always finish first. In order to achieve this, the on-replica queues (e.g. waitress queue) won't be used.
- Better/fancier queueing should not cause latency to increase when the queue is zero-length
- horizontal pod autoscaling should be based on queue length, and support scale-to-zero
- explore if we can / should use vertical pod autoscaler if we move off of horizontal pod autoscaler (they can't be used at the same time)
- issue with using the NLB's queue is that if multiple APIs, non-impacted APIs could be waiting on other busy APIs. Also, if the operator is on the same NLB, it could starve the operator
- Should avoid single point of failure on queue: needs redundancy/mirroring, aka highly available (e.g. RabbitMQ HA
- Consider an AWS managed queueing service
- Consider targeting median queue length, avg queue length, median queue latency, avg queue latency, median total latency, avg total latency
API parallelism
- Users should be able to use asyncio in APIs
- It would be cool if the user could configure concurrency of their replicas, e.g. support n requests in-flight per replica (default: 1)
Efficient load balancing/routing
- eliminate extra hops if possible
- should APIs and operator share the same NLB?
Allow deployment to existing VPC / security groups?

Research

Misc
API Gateway
- Tutorial for connecting API Gateway to VPC Link
- v1 (REST) vs v2 (HTTP): blog post, blog post 2, comparison to v1
  - v2 (HTTP) doesn't support private VPC linking, may regions, API key auth, IAM auth (yet?)
- This is how API Gateway does IAM auth
- https: faqs, search "HTTPS"
- AWS API Gateway Ingress Controller
HPA
- How to use cloudwatch metrics in the HPA
NLB
- NLB on k8s tutorial, also see Gloo blog post above
- Can be in private VPC API Gateway connection, VPC link, private integrations
EKS
- K8s API server access controls (docs)
ALB (but API Gateway + NLB is better)
- ALB ingress controller
- ELB vs ALB vs NLB blog post blog post 2, feature comparison

The text was updated successfully, but these errors were encountered:

deliahu · 2021-04-07T17:58:06Z

Closing due to vagueness (will create separate tickets for any items on this list which are still relevant)

deliahu added the enhancement New feature or request label Jul 23, 2019

This was referenced Jul 25, 2019

Support operator identity verification #138

Closed

Add configuration to control network access to Cortex cluster #224

Closed

deliahu added research Determine technical constraints v0.11 labels Nov 5, 2019

deliahu removed the v0.11 label Nov 20, 2019

deliahu changed the title ~~Revisit networking / auth~~ Revisit networking / auth [3] Nov 25, 2019

deliahu self-assigned this Nov 25, 2019

deliahu added the v0.12 label Nov 25, 2019

deliahu removed their assignment Dec 13, 2019

deliahu removed the v0.12 label Dec 13, 2019

ospillinger added the v0.13 label Dec 20, 2019

deliahu changed the title ~~Revisit networking / auth [3]~~ Revisit networking / auth Dec 20, 2019

deliahu assigned deliahu and ospillinger Dec 20, 2019

deliahu added v0.14 and removed v0.13 labels Jan 6, 2020

deliahu unassigned deliahu and ospillinger Jan 22, 2020

deliahu removed the v0.14 label Feb 12, 2020

deliahu added the v0.15 label Feb 12, 2020

deliahu added v0.16 and removed v0.15 labels Mar 9, 2020

deliahu added v0.17 and removed v0.16 labels Mar 24, 2020

deliahu removed the v0.17 label Apr 14, 2020

RobertLucian added this to the v0.33 milestone Apr 2, 2021

deliahu closed this as completed Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit networking / auth #254

Revisit networking / auth #254

deliahu commented Jul 23, 2019 •

edited

Loading

deliahu commented Apr 7, 2021

Revisit networking / auth #254

Revisit networking / auth #254

Comments

deliahu commented Jul 23, 2019 • edited Loading

Context

Research

deliahu commented Apr 7, 2021

deliahu commented Jul 23, 2019 •

edited

Loading