You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
operator and APIs may behave differently, or each support multiple options but separately configurable (e.g. user could choose HTTP Basic Auth for operator and API Token for APIs)
what does eksctl provide in terms of private security groups?
Add simple configuration to control network access to Cortex cluster (e.g. open to the world, open to specific IPs / IAMs, ...)?
in theory, auth could be implemented in workload (e.g. how IAM check in operator is done now), Istio, or API Gateway. Drop Istio? Can traffic splitting be done in API gateway or NLB? Consider Ambassador, Kong, Gloo?
If considering API Gateway, research its price (may be expensive). AWS launched a stripped down version at ReInvent 2019, which may have the features we need, and is cheaper.
Also consider how the CLI can verify that it's connected to the intended operator (to prevent operator impersonation). We should be able to turn off "SSL no verify".
To make the ELB private, it might be better to use NLB instead of ELB if possible, and use VPC Link to connect to it from the API Gateway. Internal ELBs are also possible, but API Gateway may not be able to connect to them?
While we're at it, it's worth checking if we need the ELB at all (especially if we use API gateway), and if so should we consider NLB or ALB instead?
Do we want a separate ELB/ALB/NLBs for operator and APIs?
HTTPS
https out-of-the box using someone else's certs (e.g. API Gateway)
API gateway would make non-user-provided HTTPS easy, since it uses AWS's certs, and would make it easy to add IAM or api key auth, but would reduce cloud-agnosticism.
https using user-provided certs
Private cluster
For security, all nodegroups should be private, and the NLBs should also be in the private subnet in the VPC. Here is the eksctl docs
In eks.yaml, use privateNetworking: true in the node group configs
Should we make it public vs private configurable? If so, which would be default?
Ideally the queue should be "fair", in the sense that requests that come first always finish first. In order to achieve this, the on-replica queues (e.g. waitress queue) won't be used.
Better/fancier queueing should not cause latency to increase when the queue is zero-length
horizontal pod autoscaling should be based on queue length, and support scale-to-zero
explore if we can / should use vertical pod autoscaler if we move off of horizontal pod autoscaler (they can't be used at the same time)
issue with using the NLB's queue is that if multiple APIs, non-impacted APIs could be waiting on other busy APIs. Also, if the operator is on the same NLB, it could starve the operator
Should avoid single point of failure on queue: needs redundancy/mirroring, aka highly available (e.g. RabbitMQ HA
Consider an AWS managed queueing service
Consider targeting median queue length, avg queue length, median queue latency, avg queue latency, median total latency, avg total latency
API parallelism
Users should be able to use asyncio in APIs
It would be cool if the user could configure concurrency of their replicas, e.g. support n requests in-flight per replica (default: 1)
Efficient load balancing/routing
eliminate extra hops if possible
should APIs and operator share the same NLB?
Allow deployment to existing VPC / security groups?
Context
eksctl
provide in terms of private security groups?privateNetworking: true
in the node group configsasyncio
in APIsResearch
The text was updated successfully, but these errors were encountered: