Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit networking / auth #254

Closed
deliahu opened this issue Jul 23, 2019 · 1 comment
Closed

Revisit networking / auth #254

deliahu opened this issue Jul 23, 2019 · 1 comment
Labels
enhancement New feature or request research Determine technical constraints
Milestone

Comments

@deliahu
Copy link
Member

deliahu commented Jul 23, 2019

Context

  • Access control / authorization / authentication
    • operator and APIs may behave differently, or each support multiple options but separately configurable (e.g. user could choose HTTP Basic Auth for operator and API Token for APIs)
    • what does eksctl provide in terms of private security groups?
    • Add simple configuration to control network access to Cortex cluster (e.g. open to the world, open to specific IPs / IAMs, ...)?
    • in theory, auth could be implemented in workload (e.g. how IAM check in operator is done now), Istio, or API Gateway. Drop Istio? Can traffic splitting be done in API gateway or NLB? Consider Ambassador, Kong, Gloo?
    • If considering API Gateway, research its price (may be expensive). AWS launched a stripped down version at ReInvent 2019, which may have the features we need, and is cheaper.
    • Also consider how the CLI can verify that it's connected to the intended operator (to prevent operator impersonation). We should be able to turn off "SSL no verify".
    • To make the ELB private, it might be better to use NLB instead of ELB if possible, and use VPC Link to connect to it from the API Gateway. Internal ELBs are also possible, but API Gateway may not be able to connect to them?
    • While we're at it, it's worth checking if we need the ELB at all (especially if we use API gateway), and if so should we consider NLB or ALB instead?
    • Do we want a separate ELB/ALB/NLBs for operator and APIs?
  • HTTPS
    • https out-of-the box using someone else's certs (e.g. API Gateway)
      • API gateway would make non-user-provided HTTPS easy, since it uses AWS's certs, and would make it easy to add IAM or api key auth, but would reduce cloud-agnosticism.
    • https using user-provided certs
  • Private cluster
    • For security, all nodegroups should be private, and the NLBs should also be in the private subnet in the VPC. Here is the eksctl docs
      • In eks.yaml, use privateNetworking: true in the node group configs
    • Should we make it public vs private configurable? If so, which would be default?
  • NAT Gateway
  • Queuing
    • Ideally the queue should be "fair", in the sense that requests that come first always finish first. In order to achieve this, the on-replica queues (e.g. waitress queue) won't be used.
    • Better/fancier queueing should not cause latency to increase when the queue is zero-length
    • horizontal pod autoscaling should be based on queue length, and support scale-to-zero
    • explore if we can / should use vertical pod autoscaler if we move off of horizontal pod autoscaler (they can't be used at the same time)
    • issue with using the NLB's queue is that if multiple APIs, non-impacted APIs could be waiting on other busy APIs. Also, if the operator is on the same NLB, it could starve the operator
    • Should avoid single point of failure on queue: needs redundancy/mirroring, aka highly available (e.g. RabbitMQ HA
    • Consider an AWS managed queueing service
    • Consider targeting median queue length, avg queue length, median queue latency, avg queue latency, median total latency, avg total latency
  • API parallelism
    • Users should be able to use asyncio in APIs
    • It would be cool if the user could configure concurrency of their replicas, e.g. support n requests in-flight per replica (default: 1)
  • Efficient load balancing/routing
    • eliminate extra hops if possible
    • should APIs and operator share the same NLB?
  • Allow deployment to existing VPC / security groups?

Research

@deliahu deliahu added the enhancement New feature or request label Jul 23, 2019
@deliahu deliahu added research Determine technical constraints v0.11 labels Nov 5, 2019
@deliahu deliahu removed the v0.11 label Nov 20, 2019
@deliahu deliahu changed the title Revisit networking / auth Revisit networking / auth [3] Nov 25, 2019
@deliahu deliahu self-assigned this Nov 25, 2019
@deliahu deliahu added the v0.12 label Nov 25, 2019
@deliahu deliahu removed their assignment Dec 13, 2019
@deliahu deliahu removed the v0.12 label Dec 13, 2019
@deliahu deliahu changed the title Revisit networking / auth [3] Revisit networking / auth Dec 20, 2019
@deliahu deliahu added v0.14 and removed v0.13 labels Jan 6, 2020
@deliahu deliahu removed the v0.14 label Feb 12, 2020
@deliahu deliahu added the v0.15 label Feb 12, 2020
@deliahu deliahu added v0.16 and removed v0.15 labels Mar 9, 2020
@deliahu deliahu added v0.17 and removed v0.16 labels Mar 24, 2020
@deliahu deliahu removed the v0.17 label Apr 14, 2020
@RobertLucian RobertLucian added this to the v0.33 milestone Apr 2, 2021
@deliahu
Copy link
Member Author

deliahu commented Apr 7, 2021

Closing due to vagueness (will create separate tickets for any items on this list which are still relevant)

@deliahu deliahu closed this as completed Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research Determine technical constraints
Projects
None yet
Development

No branches or pull requests

3 participants