-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sharding large workloads across multiple Kibana instances #93029
Comments
Pinging @elastic/kibana-core (Team:Core) |
Knowing the liveness of other Kibana instances is only required for intelligent clustering where we automatically balance the workload when the cluster size changes (due to instances having downtime or being removed permanently). It provides a nice out-of-the-box experience for users, but an alternative is to implement dumb clustering where users configure each Kibana instance with the values needed to participate in a "cluster". Each instance would need to have the following configured:
As long as users configure shards < 2*total instances we'll have high availability if one instance goes down. |
Cluster state algorithm (draft, definitely needs more scrutiny)
To implement sharding for queues like task manager each instance peaks at the queue and pulls e.g. 50 results. From the queue results, it only processes items that belongs to its shard. These are all documents for which When a Kibana node goes offline it will not be able to process any items on the queue. But after At any point in time, all nodes might not agree about which nodes are available / offline which could lead to multiple instances trying to acquire a lock on the same queue item. This will lead to some optimistic concurrency conflicts, but only for |
I feel like this is not going to be the only use case where we gonna need to use ES as a Kibana 'internal system' storage. First example coming to mind would (potentially, nothing is acted yet) to store per-type index version if we were to try to implement #104081. For that reason, and even if (Just my 2cts, for the rest of the algorithm I need to think more in depth about it) |
Closing in favour of newer discussion in #187696 |
(although this is on the core team's long term radar, we're not actively working towards adding this at the moment)
Kibana instances are currently completely isolated and don't know anything about other instances that are connected to the same Elasticsearch cluster. This means that synchronizing workloads between instances has to be done with optimistic concurrency control, however every conflict requires an additional roundtrip to identify available tasks which limits the scalability of this approach.
Sharding seems like the most promising solution to this problem and could increase the scalability of e.g. task manager and saved object migrations.
To support sharding, Kibana will need to have a discovery mechanism to discover and check the liveness of other Kibana instances connected to the same Elasticsearch cluster so that work can be sharded over the available number of instances.
Since all Kibana instances in a "cluster" already share the same Elasticsearch cluster, we can use Elasticsearch as our transport mechanism by keeping a "cluster state" document that all Kibana instances write to.
Instances broadcast their participation by bumping the
lastSeen
value associated with their instance id to the value of their system's monotonically increasing clockprocess.hrtime()
. Other instances poll the cluster state document to detect the presence of new nodes, only once an instance'slastSeen
value was increased from the previous value is that instance considered alive.If the
lastSeen
value doesn't increase forexpiry_timeout
(as measured by this instance's monotonically increasing clock) that instance is no longer considered as participating in this cluster.The text was updated successfully, but these errors were encountered: