-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/30904 cluster formation part2 #32877
Fix/30904 cluster formation part2 #32877
Conversation
- This allows to move all all .java files from .groovy. - Will prevent eclipse from tangling up in this setup - make it possible to use Version from Java
``` > Task :plugins:ingest-user-agent:listElasticSearchClusters Starting cluster: myTestCluster * myTestCluster: /home/alpar/work/elastic/elasticsearch/plugins/ingest-user-agent/foo Asked to unClaimAndStop myTestCluster, since cluster still has 1 claim it will not be stopped > Task :plugins:ingest-user-agent:testme UP-TO-DATE Stopping myTestCluster, since no of claims is 0 ``` - Meant to auto manage the clusters lifecycle - Add integration test for cluster formation
@rjernst I changed the DSL but kept the task extension. The static method to get it is now moved to it's proper class. There's a lot of boilerplate in java to access it in all, but it I think the task extension mechanism is a natural way to achieve this in Gradle rather than store this in a separate map. The extensions points we use work with tasks, so we just get the extensions from the task to know about all the clusters it claims. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes thus far! One more minor naming change, that I believe we discussed, is to name the plugin and dsl such as “clusters” or “testClusters” (eg TestClustersPlugin, elasticsearch.test-clusters, etc). Cluster formation is sort of a legacy name that I don’t think we should continue with. Additionally, I don’t see why we need an interface to abstract cluster configuration vs node configuration? If we need it in the future, that is fine, but let’s wait until it is needed rather than abstracting prematurely.
My larger concern is still the extension. I think it makes this more complicated (ie harder to follow what the code is doing) than it needs to be. I believe all of the logic can be within the single plugin class, which makes it much easier to see the entire picture. Let me try to describe the solution I’ve been advocating for in more concrete terms.
To start, we need a way to track the number of current uses of each cluster. This can be a simple map by cluster name:
Map<String, Integer> refCounts = new HashMap<>();
Next we need to keep track of which clusters are associated with tasks, and which tasks are associated with clusters. We can do this as local variables within the plugin’s apply method:
Map<String, List<Task>> clusterToTasks = new HashMap<>();
Map<Task, List<ElasticsearchCluster>> taskToCluster = new HashMap<>();
Then the useCluster
method on tasks can be a simple extension property that is a closure, just like you have, but the doCall method would be:
clusterToTasks.computeIfAbsent(conf.getName(), k -> new ArrayList()).add(task);
taskToCluster.computIfAbsent(task, k -> new ArrayList()).add(conf);
Now, instead of needing to iterate over all tasks in the task graph, we only need to iterate the uses map, to setup the ref counts:
project.getGradle().getTaskGraph().whenReady(taskExecutionGraph ->
clusterToTasks.forEach((name, tasks) -> {
refCounts.put(name, tasks.stream().filter(t -> taskExecutionGraph.hasTask(t)).count());
for (Task task : tasks) {
// use a doFirst so that if the task is skipped, it will not trigger the cluster to start
task.doFirst(t -> taskToCluster.get(task).forEach(cluster -> cluster.start()));
}
});
taskExecutionGraph.afterTask(task -> {
ElasticsearchCluster cluster = taskToCluster.get(task);
int uses = refCounts.get(cluster.getName())
refCounts.put(cluster.getName(), uses - 1);
if (uses <= 1) {
// last use, so we can stop the cluster
cluster.stop();
}
});
);
Having all of these hooks registered next to each other makes the logic much easier to follow.
I removed the Task extensions, but kept the listener. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. I don't see the advantage, though, to using a listener vs attaching to the task graph. They should be functionally equivalent? And if a task is disabled, we should not be starting the cluster for it; this is why I like using doFirst. Additionally, I think the cluster configuration should not contain mutable state, which is why the design I proposed had all tracking logic about use inside the plugin itself.
project.getGradle().getTaskGraph().whenReady(taskExecutionGraph -> | ||
taskExecutionGraph.getAllTasks() | ||
.forEach(task -> | ||
taskToCluster.getOrDefault(task, new ArrayList<>()).forEach(ElasticsearchConfiguration::claim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would cause taskToCluster
to fill itself with an empty list for every other task in the execution graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will create an empty list, but not add it to the map should have used emptyList()
i'm fixing that.
|
||
// create the listener to start the clusters on-demand and terminate when no longer claimed. | ||
project.getGradle().addListener( | ||
new ClusterFormationTaskExecutionListener(Collections.unmodifiableMap(taskToCluster)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suffers from the same concern I had in my previous review: by using a separate class, it increases the complexity of what must be understood by someone reading this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree that the listener class is more complex than hooking on the task graph and doFirst, I would even argue that the listener interface is simpler because it follows a well understood design pattern. To me this fits better than a plugin implemented in java.
It does have all the advantages, if a task is skipped, no actions will be run for that task, so beforeActions
is never called and thus the cluster is not started.
The real tie breaker for me is that it also allows for tasks to have their own doFirst that interacts with the cluster, e.x. a testing task that does some setup before running the test, like indexing ( I think docs tests do this ). Keep in mind that there won't be a separate start cluster task. If we start the cluster in doFrist
there will be no way for build scripts to add any actions that run after the cluster is started but before the regular task actions.
W.r.t immutability, the listener gets an immutable map and the configurations themselves lock-down once the cluster is running. This will guarantee that we don't change configuration in the execution phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real tie breaker for me is that it also allows for tasks to have their own doFirst
My proposed solution still allows this. The doFirst is not added until the task graph is resolved, after configuration is complete. So the doFirst added is actually the very first doFirst, before any configured doFirst blocks.
the configurations themselves lock-down once the cluster is running
But they don't actually, because the "claimed" tracking is done there, which is always mutable.
I would even argue that the listener interface is simpler because it follows a well understood design pattern
All of these are hooks that allow executing a block of code before/after a task is run. They have different nuances, but they are all the same design pattern. Additionally, you could still use the listener, but I am suggesting it can be created as either an inner class, or an anonymous class, inside the plugin, so that the logic and state are all together.
package org.elasticsearch.gradle.clusterformation; | ||
|
||
@FunctionalInterface | ||
public interface UseClusterAction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have been created but not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@rjernst thanks for the quick thoughts! |
Ok thanks. I'm fine with that being collapsed in a followup. |
@rjernst does this look good now ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atorok I don't think all my concerns were addressed? This still has the configuration object as mutable with state inside it.
I think I misunderstood that part. Do you mean |
I mean the use of numberOfClaims inside the configuration object instead of in the plugin. Additionally, I'm not sure the started flag is necessary. All of this can be tracked inside the plugin, and the elasticsearch configuration can be immutable once the gradle configuration phase is complete. |
@rjernst I do see your point now. I propose to discuss this in the next PR which adds the implementation to start the cluster. I'm waiting to merge this before I open that PR to avoid this one getting too big. The implementation does not separate the configuration objects from the logic of what it takes to configure and start and adds additional logic to deal with failing tasks, adding more context to this discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's discuss in the next part.
Gradle integration for the Cluster formation plugin with ref counting
* 6.x: Mute test watcher usage stats output [Rollup] Fix FullClusterRestart test TEST: Disable soft-deletes in ParentChildTestCase TEST: Disable randomized soft-deletes settings Integrates soft-deletes into Elasticsearch (#33222) drop `index.shard.check_on_startup: fix` (#32279) Fix AwaitsFix issue number Mute SmokeTestWatcherWithSecurityIT testsi [DOCS] Moves ml folder from x-pack/docs to docs (#33248) TEST: mute more SmokeTestWatcherWithSecurityIT tests [DOCS] Move rollup APIs to docs (#31450) [DOCS] Rename X-Pack Commands section (#33005) Fixes SecurityIntegTestCase so it always adds at least one alias (#33296) TESTS: Fix Random Fail in MockTcpTransportTests (#33061) (#33307) MINOR: Remove Dead Code from PathTrie (#33280) (#33306) Fix pom for build-tools (#33300) Lazy evaluate java9home (#33301) SQL: test coverage for JdbcResultSet (#32813) Work around to be able to generate eclipse projects (#33295) Different handling for security specific errors in the CLI. Fix for #33230 (#33255) [ML] Refactor delimited file structure detection (#33233) SQL: Support multi-index format as table identifier (#33278) Enable forbiddenapis server java9 (#33245) [MUTE] SmokeTestWatcherWithSecurityIT flaky tests Add region ISO code to GeoIP Ingest plugin (#31669) (#33276) Don't be strict for 6.x Update serialization versions for custom IndexMetaData backport Replace IndexMetaData.Custom with Map-based custom metadata (#32749) Painless: Fix Bindings Bug (#33274) SQL: prevent duplicate generation for repeated aggs (#33252) TEST: Mute testMonitorClusterHealth Fix serialization of empty field capabilities response (#33263) Fix nested _source retrieval with includes/excludes (#33180) [DOCS] TLS file resources are reloadable (#33258) Watcher: Ensure TriggerEngine start replaces existing watches (#33157) Ignore module-info in jar hell checks (#33011) Fix docs build after #33241 [DOC] Repository GCS ADC not supported (#33238) Upgrade to latest Gradle 4.10 (#32801) Fix/30904 cluster formation part2 (#32877) Move file-based discovery to core (#33241) HLRC: add client side RefreshPolicy (#33209) [Kerberos] Add unsupported languages for tests (#33253) Watcher: Reload properly on remote shard change (#33167) Fix classpath security checks for external tests. (#33066) [Rollup] Only allow aggregating on multiples of configured interval (#32052) Added deprecation warning for rescore in scroll queries (#33070) Apply settings filter to get cluster settings API (#33247) [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability (#32743) HLRC: create base timed request class (#33216) HLRC: Use Optional in validation logic (#33104) Painless: Add Bindings (#33042)
This is a continuation of #32028 which I accidentally merged, then reverted right away due to lack of caffeine.