-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about run 2 or multiple remote cache instances #483
Comments
While S3 is strongly cache consistent from late 2020[*], using two bazel-remote instances with the S3 proxy backend does not provide strong cache consistency if a client switches between the two bazel-remote instances in a single build, because bazel-remote stores blobs locally and uploads to S3 asynchronously. [*] https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/ Why do you need two bazel-remote instances?
That would be possible with some refactoring, but I suspect there would be performance hit. Also, I don't know how to implemement LRU-like cache size limiting with S3 so I would feel bad recommending this configuration. |
Thanks for reply, the reason for 2 instances is have higher availability, we could managed it by k8s etc. |
I don’t use or know much about S3. But In general, I think it would be easier to achieve simple and reliable failover to secondary cache instance, if the bazel client was better at retrying after issues. More specifically:
|
I run something which looks similar to what you want: We have multiple bazel-remote instances (currently 6) with identical config inside a kubernetes cluster. Each has its own local disk and a shared s3 bucket. These are fronted by a couple of nginx instances managed by ingress-nginx. Using Depending on how long items stored in the cache are relevant to your clients, the extra s3 backend might not be needed for your use case. |
@kragniz: if you'd be willing to share some example configurations, I'd love to add it to an examples doc or directory here. |
Thanks @kragniz for sharing you idea! Also interested in you example! So if same blob hit same remote-cache and if that instance is done, we will still not have HA in this case. And when you increase/decrease the number, what you guys did? |
@mostynb sure, I'll create a PR at some point |
@BoyangTian-Robinhood requests will always get sent to a ready instance (there's a readiness probe configured to make sure instances are correctly returning the empty CAS blob), so requests will get redirected to one of the other instances in that case. It is likely that most the objects have already been uploaded to s3, so that second instance will look it up and get a cache hit (with some extra latency). This generally means all requests will get a response, but the cache hit rate and latency will both be slightly worse while restarting/scaling instances. |
Hi @kragniz thanks for you detail explain! Sorry still one part is not clear to me. For example, first AC request to bazel-remote-instance-1 which has both AC and CAS stored on it local disk. Then it will reply cache exist to client. Then bazel-remote-instance-1 dead. Then we got CAS request, then readiness probe will return something like 404, in this case since it not return no CAS, there is no bazel error right? Another question is local disk is mounted right? k8s managed docker disk size has upper limit, so it is not a |
@BoyangTian-Robinhood: are you still looking for help with this? |
Hi this is a question instead of a bug fix.
I know for remote-cache request, it will first as AC and then CAS. But if we only have local disk store and we have 2 bazel-remote instance, it could cause inconsistency issue. Therefore have 2 questions.
Thanks!
The text was updated successfully, but these errors were encountered: