-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manage Elasticsearch nodes with dedicated subcommands #830
Manage Elasticsearch nodes with dedicated subcommands #830
Conversation
As we pickle the node, in-memory state of telemetry devices is persisted anyway and there is no need to store it in an additional file.
To help with the review, here are some test commands:
To test a Docker container (see
All other commands ( It is also possible to use user-tags. Just specify them when running the benchmark:
This should also show up on system metrics. To benchmark a source build, you need to modify the install command (use
For a multi-node cluster, install multiple nodes but reference the other node(s) as seed hosts:
Note that we are always using the default http port (which is 39200). To customize it, specify e.g. |
random_build_type build_type | ||
|
||
# for Docker we force the most recent distribution as we don't have Docker images for all versions that are tested | ||
if [[ "$build_type" == "docker" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we kill Rally processes in kill_related_es_processes
we don't do the same for any running Docker images. I think that we might need to stop Docker containers as well but I wonder whether we can find a robust approach of finding the correct image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible. What we need to do is:
- Change
docker-compose.yml.j2
to further specify a label like so:
diff --git a/esrally/resources/docker-compose.yml.j2 b/esrally/resources/docker-compose.yml.j2
index c5bc10a..19e72e9 100644
--- a/esrally/resources/docker-compose.yml.j2
+++ b/esrally/resources/docker-compose.yml.j2
@@ -4,6 +4,8 @@ services:
cap_add:
- IPC_LOCK
image: "{{docker_image}}:{{es_version}}"
+ labels:
+ io.rally.description: "Label to help identify Elasticsearch containers launched by Rally"
{%- if docker_cpu_count is defined %}
cpu_count: {{docker_cpu_count}}
{%- endif %}
- Use the label to retrieve the ID of container(s) launched by Rally using a filter like
docker ps --filter "label=io.rally.description" --format "{{.ID}}"
(if >1 they'll be newline separated).
This will match all nodes matching the specific label key. To more targeted, we can also match the value which could be optionally specified via a j2 variable in the compose file, but this functionality would require an optional argument for the start subcommand (e.g.--docker-label="..."
) to override the j2 variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll implement this in the integration test. I think it's fine to use the less targeted approach and rely on a static label in docker-compose.yml.j2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 9ef5660.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge amount of work at a very high level of quality! Thank you.
I am leaving an initial batch of comments; I've reviewed up until the rally.py
file.
|
||
esrally stop --installation-id="69ffcfee-6378-4090-9e93-87c9f8ee59a7" --race-id="${RACE_ID}" | ||
|
||
If you only want to shutdown the node but don't want to delete the node and the data, pass ``--preserve-install`` additionally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pass additionally --preserve-install
" instead?
esrally/mechanic/launcher.py
Outdated
def stop(self, nodes, metrics_store): | ||
self.logger.info("Shutting down [%d] nodes running in Docker on this host.", len(nodes)) | ||
for node in nodes: | ||
# readd meta-data - we already did this on startup but in case dedicated subcommands are used for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/readd/read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really meant this as: "add the meta-data again". But this was only valid in an earlier commit so I'll remove the comment entirely.
With this commit we introduce a new `put-settings` operation that can be used to update cluster settings via the REST API. We also deprecate the track property `cluster-settings` which had a similar purpose but the cluster settings ended up in `elasticsearch.yml` instead of being updated via an API. This is now tricky as we will move away from an integrated cluster management (see also elastic#830) and we should instead add settings that need to be persistent in `elasticsearch.yml` via `--car-params` and settings that are per track via the cluster settings API. Relates elastic/rally-tracks#93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished the review, left a few more comments.
random_build_type build_type | ||
|
||
# for Docker we force the most recent distribution as we don't have Docker images for all versions that are tested | ||
if [[ "$build_type" == "docker" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible. What we need to do is:
- Change
docker-compose.yml.j2
to further specify a label like so:
diff --git a/esrally/resources/docker-compose.yml.j2 b/esrally/resources/docker-compose.yml.j2
index c5bc10a..19e72e9 100644
--- a/esrally/resources/docker-compose.yml.j2
+++ b/esrally/resources/docker-compose.yml.j2
@@ -4,6 +4,8 @@ services:
cap_add:
- IPC_LOCK
image: "{{docker_image}}:{{es_version}}"
+ labels:
+ io.rally.description: "Label to help identify Elasticsearch containers launched by Rally"
{%- if docker_cpu_count is defined %}
cpu_count: {{docker_cpu_count}}
{%- endif %}
- Use the label to retrieve the ID of container(s) launched by Rally using a filter like
docker ps --filter "label=io.rally.description" --format "{{.ID}}"
(if >1 they'll be newline separated).
This will match all nodes matching the specific label key. To more targeted, we can also match the value which could be optionally specified via a j2 variable in the compose file, but this functionality would require an optional argument for the start subcommand (e.g.--docker-label="..."
) to override the j2 variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Some minor comments on arguments for now.
esrally/rally.py
Outdated
stop_parser.add_argument( | ||
"--installation-id", | ||
required=True, | ||
help="The id of the installation to start", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/start/stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
esrally/rally.py
Outdated
"--race-id", | ||
required=True, | ||
help="Define a unique id for this race.", | ||
default="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably, the "stop" command shouldn't be caring about races as it is just stopping nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon stopping a node, Rally needs to store metrics and aggregate them. To have consistent meta-data across all nodes involved in a benchmark, we need the race id to tie them together and that's why this command line parameter is needed. We could probably store it when a node is started but I've opted for this simpler approach here. In fact, we could even skip it upon node startup because we only store metrics when a node is stopped but this felt like leaking implementation details into the command line interface which I wanted to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach of storing the active race ID. Any chance we could include it in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, it makes the interface definitely simpler and easier to use. I've implemented this in dbbbdb4.
data-params-hot.json
Outdated
@@ -0,0 +1,9 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file was added by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in 947c110
data-params-warm.json
Outdated
@@ -0,0 +1,9 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file was added by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in 947c110
master-params.json
Outdated
@@ -0,0 +1,8 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file was added by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in 947c110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me!
Thanks to both of you for the review! :) |
With this commit we introduce a new `put-settings` operation that can be used to update cluster settings via the REST API. We also deprecate the track property `cluster-settings` which had a similar purpose but the cluster settings ended up in `elasticsearch.yml` instead of being updated via an API. This is now tricky as we will move away from an integrated cluster management (see also #830) and we should instead add settings that need to be persistent in `elasticsearch.yml` via `--car-params` and settings that are per track via the cluster settings API. Relates elastic/rally-tracks#93 Relates #831
With this commit we introduce three new subcommands to Rally:
install
: To install a single Elasticsearch node locallystart
: To start an Elasticsearch node that has been previously installedstop
: To stop a running Elasticsearch nodeTo run a benchmark, users first issue
install
, followed bystart
on allnodes. Afterwards, the benchmark is run using the
benchmark-only
pipeline.Finally, the
stop
command is invoked on all nodes to shutdown the cluster.To ensure that system metrics are stored consistently (i.e. they contain the
same metadata like race id and race timestamp), we expose the race id as a
command line parameter and defer writing any system metrics until the
stop
command is invoked. We attempt to read race metadata from the Elasticsearch
metrics store for that race id which have been written earlier by the benchmark
and merge the metadata when we write the system metrics.
The current implementation is considered a new experimental addition to the
existing mechanism to manage clusters with the intention to eventually replace
it. The command line interface is specific to Zen discovery and subject to
change as we learn more about its use.
Closes #697