Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage Elasticsearch nodes with dedicated subcommands #830

Merged
merged 22 commits into from
Dec 4, 2019

Conversation

danielmitterdorfer
Copy link
Member

@danielmitterdorfer danielmitterdorfer commented Nov 27, 2019

With this commit we introduce three new subcommands to Rally:

  • install: To install a single Elasticsearch node locally
  • start: To start an Elasticsearch node that has been previously installed
  • stop: To stop a running Elasticsearch node

To run a benchmark, users first issue install, followed by start on all
nodes. Afterwards, the benchmark is run using the benchmark-only pipeline.
Finally, the stop command is invoked on all nodes to shutdown the cluster.

To ensure that system metrics are stored consistently (i.e. they contain the
same metadata like race id and race timestamp), we expose the race id as a
command line parameter and defer writing any system metrics until the stop
command is invoked. We attempt to read race metadata from the Elasticsearch
metrics store for that race id which have been written earlier by the benchmark
and merge the metadata when we write the system metrics.

The current implementation is considered a new experimental addition to the
existing mechanism to manage clusters with the intention to eventually replace
it. The command line interface is specific to Zen discovery and subject to
change as we learn more about its use.

Closes #697

@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics :Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch labels Nov 27, 2019
@danielmitterdorfer danielmitterdorfer added this to the 1.4.0 milestone Nov 27, 2019
@danielmitterdorfer danielmitterdorfer self-assigned this Nov 27, 2019
@danielmitterdorfer
Copy link
Member Author

To help with the review, here are some test commands:

# benchmark a single-node cluster
esrally install --quiet --distribution-version=7.4.2 --build-type=tar --node-name="rally-node-0" --master-nodes="rally-node-0" --network-host="127.0.0.1" --seed-hosts="127.0.0.1:39300"

# TODO: Capture the installation id from the previous command
esrally start --installation-id=c2e6d5fb-0405-4f39-9213-cf8799a760cc --race-id=b1228394-998f-413d-b454-adef7e0f2a7b

esrally --pipeline=benchmark-only --target-host=127.0.0.1:39200 --track=geonames --challenge=append-no-conflicts-index-only --on-error=abort --race-id=b1228394-998f-413d-b454-adef7e0f2a7b

esrally stop --installation-id=c2e6d5fb-0405-4f39-9213-cf8799a760cc --race-id=b1228394-998f-413d-b454-adef7e0f2a7b

To test a Docker container (see --build-type=docker):

esrally install --quiet --distribution-version=7.4.2 --build-type=docker --node-name="rally-node-0" --master-nodes="rally-node-0" --network-host="127.0.0.1" --seed-hosts="127.0.0.1:39300"

All other commands (start, stop) are identical to the scenario above. The Docker support in Rally has (and always had) several restrictions, e.g. you can only spin up a single node. This has not changed with this PR.

It is also possible to use user-tags. Just specify them when running the benchmark:


esrally --pipeline=benchmark-only --target-host=127.0.0.1:39200 --track=geonames --challenge=append-no-conflicts-index-only --on-error=abort --race-id=de604d0d-926c-4764-bbef-d8bc564755ae --user-tag="with-user-tags:true"

This should also show up on system metrics.

To benchmark a source build, you need to modify the install command (use --skip-build to skip the build):

esrally install --revision=latest --node-name="rally-node-0" --network-host="127.0.0.1" --master-nodes="rally-node-0" --seed-hosts="127.0.0.1:39300"

For a multi-node cluster, install multiple nodes but reference the other node(s) as seed hosts:

esrally install --distribution-version=7.4.2 --node-name="rally-node-0" --network-host="127.0.0.1" --master-nodes="rally-node-0,rally-node-1" --seed-hosts="127.0.0.1:39300,127.0.0.1:39301"

esrally install --distribution-version=7.4.2 --node-name="rally-node-1" --network-host="127.0.0.1" --master-nodes="rally-node-0,rally-node-1" --seed-hosts="127.0.0.1:39300,127.0.0.1:39301"

Note that we are always using the default http port (which is 39200). To customize it, specify e.g. --http-port=19200. Note that the transport port will always be 100 ports above the http port.

random_build_type build_type

# for Docker we force the most recent distribution as we don't have Docker images for all versions that are tested
if [[ "$build_type" == "docker" ]]; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we kill Rally processes in kill_related_es_processes we don't do the same for any running Docker images. I think that we might need to stop Docker containers as well but I wonder whether we can find a robust approach of finding the correct image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible. What we need to do is:

  1. Change docker-compose.yml.j2 to further specify a label like so:
diff --git a/esrally/resources/docker-compose.yml.j2 b/esrally/resources/docker-compose.yml.j2
index c5bc10a..19e72e9 100644
--- a/esrally/resources/docker-compose.yml.j2
+++ b/esrally/resources/docker-compose.yml.j2
@@ -4,6 +4,8 @@ services:
     cap_add:
       - IPC_LOCK
     image: "{{docker_image}}:{{es_version}}"
+    labels:
+      io.rally.description: "Label to help identify Elasticsearch containers launched by Rally"
     {%- if docker_cpu_count is defined %}
     cpu_count: {{docker_cpu_count}}
     {%- endif %}
  1. Use the label to retrieve the ID of container(s) launched by Rally using a filter like docker ps --filter "label=io.rally.description" --format "{{.ID}}" (if >1 they'll be newline separated).
    This will match all nodes matching the specific label key. To more targeted, we can also match the value which could be optionally specified via a j2 variable in the compose file, but this functionality would require an optional argument for the start subcommand (e.g. --docker-label="...") to override the j2 variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll implement this in the integration test. I think it's fine to use the less targeted approach and rely on a static label in docker-compose.yml.j2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 9ef5660.

@danielmitterdorfer danielmitterdorfer added the highlight A substantial improvement that is worth mentioning separately in release notes label Nov 28, 2019
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge amount of work at a very high level of quality! Thank you.

I am leaving an initial batch of comments; I've reviewed up until the rally.py file.

docs/cluster_management.rst Outdated Show resolved Hide resolved
docs/cluster_management.rst Outdated Show resolved Hide resolved
docs/cluster_management.rst Outdated Show resolved Hide resolved

esrally stop --installation-id="69ffcfee-6378-4090-9e93-87c9f8ee59a7" --race-id="${RACE_ID}"

If you only want to shutdown the node but don't want to delete the node and the data, pass ``--preserve-install`` additionally.
Copy link
Contributor

@dliappis dliappis Nov 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pass additionally --preserve-install" instead?

def stop(self, nodes, metrics_store):
self.logger.info("Shutting down [%d] nodes running in Docker on this host.", len(nodes))
for node in nodes:
# readd meta-data - we already did this on startup but in case dedicated subcommands are used for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/readd/read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really meant this as: "add the meta-data again". But this was only valid in an earlier commit so I'll remove the comment entirely.

esrally/mechanic/launcher.py Outdated Show resolved Hide resolved
esrally/mechanic/launcher.py Outdated Show resolved Hide resolved
esrally/mechanic/mechanic.py Outdated Show resolved Hide resolved
danielmitterdorfer added a commit to danielmitterdorfer/rally that referenced this pull request Nov 29, 2019
With this commit we introduce a new `put-settings` operation that can be
used to update cluster settings via the REST API. We also deprecate the
track property `cluster-settings` which had a similar purpose but the
cluster settings ended up in `elasticsearch.yml` instead of being
updated via an API. This is now tricky as we will move away from an
integrated cluster management (see also elastic#830) and we should instead add
settings that need to be persistent in `elasticsearch.yml` via
`--car-params` and settings that are per track via the cluster settings
API.

Relates elastic/rally-tracks#93
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished the review, left a few more comments.

esrally/rally.py Outdated Show resolved Hide resolved
esrally/rally.py Outdated Show resolved Hide resolved
random_build_type build_type

# for Docker we force the most recent distribution as we don't have Docker images for all versions that are tested
if [[ "$build_type" == "docker" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible. What we need to do is:

  1. Change docker-compose.yml.j2 to further specify a label like so:
diff --git a/esrally/resources/docker-compose.yml.j2 b/esrally/resources/docker-compose.yml.j2
index c5bc10a..19e72e9 100644
--- a/esrally/resources/docker-compose.yml.j2
+++ b/esrally/resources/docker-compose.yml.j2
@@ -4,6 +4,8 @@ services:
     cap_add:
       - IPC_LOCK
     image: "{{docker_image}}:{{es_version}}"
+    labels:
+      io.rally.description: "Label to help identify Elasticsearch containers launched by Rally"
     {%- if docker_cpu_count is defined %}
     cpu_count: {{docker_cpu_count}}
     {%- endif %}
  1. Use the label to retrieve the ID of container(s) launched by Rally using a filter like docker ps --filter "label=io.rally.description" --format "{{.ID}}" (if >1 they'll be newline separated).
    This will match all nodes matching the specific label key. To more targeted, we can also match the value which could be optionally specified via a j2 variable in the compose file, but this functionality would require an optional argument for the start subcommand (e.g. --docker-label="...") to override the j2 variable.

integration-test.sh Show resolved Hide resolved
Copy link
Contributor

@drawlerr drawlerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some minor comments on arguments for now.

esrally/rally.py Outdated
stop_parser.add_argument(
"--installation-id",
required=True,
help="The id of the installation to start",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/start/stop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

esrally/rally.py Outdated
"--race-id",
required=True,
help="Define a unique id for this race.",
default="")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, the "stop" command shouldn't be caring about races as it is just stopping nodes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon stopping a node, Rally needs to store metrics and aggregate them. To have consistent meta-data across all nodes involved in a benchmark, we need the race id to tie them together and that's why this command line parameter is needed. We could probably store it when a node is started but I've opted for this simpler approach here. In fact, we could even skip it upon node startup because we only store metrics when a node is stopped but this felt like leaking implementation details into the command line interface which I wanted to avoid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach of storing the active race ID. Any chance we could include it in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, it makes the interface definitely simpler and easier to use. I've implemented this in dbbbdb4.

@danielmitterdorfer
Copy link
Member Author

@dliappis, @drawlerr thanks for your feedback. I've addressed it now. Can you please have another look?

@@ -0,0 +1,9 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was added by accident?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in 947c110

@@ -0,0 +1,9 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was added by accident?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in 947c110

@@ -0,0 +1,8 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was added by accident?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in 947c110

@dliappis dliappis self-requested a review December 3, 2019 09:00
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@drawlerr drawlerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me!

@danielmitterdorfer
Copy link
Member Author

Thanks to both of you for the review! :)

@danielmitterdorfer danielmitterdorfer merged commit 2df73e9 into elastic:master Dec 4, 2019
@danielmitterdorfer danielmitterdorfer deleted the node-mgmt branch December 4, 2019 11:07
danielmitterdorfer added a commit that referenced this pull request Dec 4, 2019
With this commit we introduce a new `put-settings` operation that can be
used to update cluster settings via the REST API. We also deprecate the
track property `cluster-settings` which had a similar purpose but the
cluster settings ended up in `elasticsearch.yml` instead of being
updated via an API. This is now tricky as we will move away from an
integrated cluster management (see also #830) and we should instead add
settings that need to be persistent in `elasticsearch.yml` via
`--car-params` and settings that are per track via the cluster settings
API.

Relates elastic/rally-tracks#93
Relates #831
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch enhancement Improves the status quo highlight A substantial improvement that is worth mentioning separately in release notes :Telemetry Telemetry Devices that gather additional metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to manage Elasticsearch nodes separately from benchmarking
3 participants