Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change prod example to use pause_minority strategy #540

Merged
merged 1 commit into from
Jan 5, 2021

Conversation

coro
Copy link
Contributor

@coro coro commented Jan 4, 2021

As part of TGIR S01E09,
we found that the ignore strategy was not ideal for production use cases
where a network partition occurred due to the necessity of manual
intervention.

In addition, the readiness probe does not account for the fact that a
pod is partitioned, meaning that in the case of a 3-node quorum queue cluster with a
2-1 network partition, 1/3 of the traffic will be routed via the Service
to pods that cannot serve traffic due to them not establishing quorum.

By contrast, the pause_minority strategy will stop the RabbitMQ app,
meaning that the Service will no longer redirect traffic to the pod
until it regains quorum.

As part of [TGIR S01E09](https://www.youtube.com/watch?v=y2HAJBiXsw0),
we found that the ignore strategy was not ideal for production use cases
where a network partition occurred due to the necessity of manual
intervention.

In addition, the readiness probe does not account for the fact that a
pod is partitioned, meaning that in the case of a 3-node quorum queue cluster with a
2-1 network partition, 1/3 of the traffic will be routed via the Service
to pods that cannot serve traffic due to them not establishing quorum.

By contrast, the pause_minority strategy will stop the RabbitMQ app,
meaning that the Service will no longer redirect traffic to the pod
until it regains quorum.
Copy link
Member

@ansd ansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the video 🙂

@coro coro merged commit a7d9c2b into main Jan 5, 2021
@coro coro deleted the production-partition-handling branch January 5, 2021 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants