Make 'race' self-contained

With this commit, we establish 'race' as a self-contained entity. As this touches several areas in the metric store implementation, we use this opportunity to implement a couple of breaking changes at once. Namely: * The least supported metrics store version is now Elasticsearch 5.0 * Race and metrics are now stored in individual indices (rally-metrics-* and rally-races-* respectively). * New indices are created once per month instead of once per year. Furthermore, the `list facts` subcommand is deprecated and will be removed soon as Rally gathers all relevant information now itself and stores it in the race. We also implement a file-based version of the race store, so users can now use the tournament feature without configuring a dedicated Elasticsearch metrics store. This also simplifies sharing of race results, as users just need to share the race results file. Closes #138 Closes #220 Closes #258 Closes #279 Relates #282
elastic · May 9, 2017 · d1a7eb3 · d1a7eb3
1 parent 6ed1fea
commit d1a7eb3
Show file tree

Hide file tree

Showing 28 changed files with 1,218 additions and 530 deletions.
diff --git a/docs/command_line_reference.rst b/docs/command_line_reference.rst
@@ -24,7 +24,7 @@ The ``list`` subcommand is used to list different configuration options:
 * telemetry: Will show all :doc:`telemetry devices </telemetry>` that are supported by Rally.
 * tracks: Will show all tracks that are supported by Rally. As this *may* depend on the Elasticsearch version that you want to benchmark, you can specify ``--distribution-version`` and also ``--distribution-repository`` as additional options.
 * pipelines: Will show all :doc:`pipelines </pipelines>` that are supported by Rally.
-* races: Will show all races that are currently stored. This is only needed for the :doc:`tournament mode </tournament>` and it will also only work if you have setup Rally so it supports tournaments.
+* races: Will show all races that are currently stored. This is needed for the :doc:`tournament mode </tournament>`.
 * cars: Will show all cars that are supported by Rally (i.e. Elasticsearch configurations).
 * facts: Will show facts about the hardware and software configuration. This helps you sharing the results with others via https://github.com/elastic/rally-results.
 
@@ -41,6 +41,10 @@ The ``facts`` subcommand requires one target host. If you have started a Rally d
 
 This will gather facts about the target host with the IP ``10.17.20.3`` but requires that you have started the Rally daemon first (see :doc:`recipes </recipes>` on how to do that).
 
+.. warning::
+
+    Rally stores all machine facts automatically on the ``race``. Hence, the ``facts`` command is deprecated and will be removed in the next release of Rally.
+
 
 ``compare``
 ~~~~~~~~~~~

diff --git a/docs/configuration.rst b/docs/configuration.rst
@@ -100,7 +100,7 @@ Congratulations! Time to run your first benchmark.
 Advanced Configuration
 ----------------------
 
-If you need more control over a few variables or want to use advanced features like :doc:`tournaments </tournament>`, then you should run the advanced configuration routine. You can invoke it at any time with ``esrally configure --advanced-config``.
+If you need more control over a few variables or want to store your metrics in a dedicated Elasticsearch metrics store, then you should run the advanced configuration routine. You can invoke it at any time with ``esrally configure --advanced-config``.
 
 Prerequisites
 ~~~~~~~~~~~~~
@@ -113,7 +113,7 @@ When using the advanced configuration, Rally stores its metrics not in-memory bu
 Preparation
 ~~~~~~~~~~~
 
-First `install Elasticsearch <https://www.elastic.co/downloads/elasticsearch>`_ 2.3 or higher. A simple out-of-the-box installation with a single node will suffice. Rally uses this instance to store metrics data. It will setup the necessary indices by itself. The configuration procedure of Rally will you ask for host and port of this cluster.
+First `install Elasticsearch <https://www.elastic.co/downloads/elasticsearch>`_ 5.0 or higher. A simple out-of-the-box installation with a single node will suffice. Rally uses this instance to store metrics data. It will setup the necessary indices by itself. The configuration procedure of Rally will you ask for host and port of this cluster.
 
 .. note::
 

diff --git a/docs/metrics.rst b/docs/metrics.rst
@@ -4,7 +4,7 @@ Metrics
 Metrics Records
 ---------------
 
-At the end of a race, Rally stores all metrics records in its metrics store, which is a dedicated Elasticsearch cluster.
+At the end of a race, Rally stores all metrics records in its metrics store, which is a dedicated Elasticsearch cluster. Rally store the metrics in the indices ``rally-metrics-*`` and it will create a new index for each month.
 
 Here is a typical metrics record::
 

diff --git a/docs/tournament.rst b/docs/tournament.rst
@@ -1,10 +1,6 @@
 Tournaments
 ===========
 
-.. warning::
-
-   If you want to use tournaments, then Rally requires a dedicated metrics store as it needs to store data across multiple races. So ensure to run ``esrally configure --advanced-config`` first. For details please see the :doc:`configuration help page </configuration>`.
-
 Suppose, we want to analyze the impact of a performance improvement. First, we need a baseline measurement. We can use the command line parameter ``--user-tag`` to provide a key-value pair to document the intent of a race. After we've run both races, we want to know about the performance impact of a change. With Rally we can analyze differences of two given races easily. First of all, we need to find two races to compare by issuing ``esrally list races``::
 
     dm@io:~ $ esrally list races

diff --git a/esrally/facts.py b/esrally/facts.py
@@ -6,7 +6,7 @@
 
 
 def list_facts(cfg):
-    console.info("This is an experimental command and subject to change.")
+    console.warn("This command is deprecated and will be removed with the next release of Rally.", overline="!", underline="!")
     # provide a custom error message
     target_hosts = cfg.opts("facts", "hosts", mandatory=False)
     if not target_hosts:

diff --git a/esrally/mechanic/cluster.py b/esrally/mechanic/cluster.py
@@ -20,19 +20,28 @@ def __init__(self, process, host_name, node_name, telemetry):
         self.process = process
         self.host_name = host_name
         self.node_name = node_name
+        self.ip = None
         self.telemetry = telemetry
+        # populated by telemetry
+        self.os = {}
+        self.jvm = {}
+        self.cpu = {}
+        self.memory = {}
+        self.fs = []
 
     def on_benchmark_start(self):
         """
         Callback method when a benchmark is about to start.
         """
-        self.telemetry.on_benchmark_start()
+        if self.telemetry:
+            self.telemetry.on_benchmark_start()
 
     def on_benchmark_stop(self):
         """
         Callback method when a benchmark is about to stop.
         """
-        self.telemetry.on_benchmark_stop()
+        if self.telemetry:
+            self.telemetry.on_benchmark_stop()
 
 
 class Cluster:
@@ -53,6 +62,30 @@ def __init__(self, hosts, nodes, telemetry):
         self.distribution_version = None
         self.source_revision = None
 
+    def node(self, name):
+        """
+        Finds a cluster node by name.
+
+        :param name: The node's name.
+        :return: The corresponding node or ``None`` if the cluster does not contain a node with this name.
+        """
+        for n in self.nodes:
+            if n.node_name == name:
+                return n
+        return None
+
+    def has_node(self, name):
+        """
+        :param name: The node's name.
+        :return: True iff the cluster contains such a node.
+        """
+        return self.node(name) is not None
+
+    def add_node(self, host_name, node_name):
+        new_node = Node(process=None, host_name=host_name, node_name=node_name, telemetry=None)
+        self.nodes.append(new_node)
+        return new_node
+
     def on_benchmark_start(self):
         """
         Callback method when a benchmark is about to start.

diff --git a/esrally/mechanic/launcher.py b/esrally/mechanic/launcher.py
@@ -53,15 +53,14 @@ def start(self, car, binary, data_paths):
         t = telemetry.Telemetry(devices=[
             # Be aware that some the meta-data are taken from the host system, not the container (e.g. number of CPU cores) so if the
             # Docker container constrains these, the metrics are actually wrong.
+            telemetry.ClusterMetaDataInfo(es),
             telemetry.EnvironmentInfo(es, self.metrics_store),
             telemetry.NodeStats(es, self.metrics_store),
-            telemetry.IndexStats(es, self.metrics_store),
-            telemetry.DiskIo(self.metrics_store),
-            telemetry.CpuUsage(self.metrics_store)
+            telemetry.IndexStats(es, self.metrics_store)
         ])
 
-        c = cluster.Cluster(hosts, [], t)
-        self._start_process(cmd="docker-compose -f %s up" % self.binary_path, node_name="rally0")
+        c = cluster.Cluster(hosts, [self._start_node(hosts[0], 0, es)], t)
+
         logger.info("Docker container has successfully started. Checking if REST API is available.")
         if wait_for_rest_layer(es):
             logger.info("REST API is available. Attaching telemetry devices to cluster.")
@@ -73,6 +72,18 @@ def start(self, car, binary, data_paths):
             raise exceptions.LaunchError("Elasticsearch REST API layer is not available. Forcefully terminated cluster.")
         return c
 
+    def _start_node(self, host, node, es):
+        node_name = self._node_name(node)
+        p = self._start_process(cmd="docker-compose -f %s up" % self.binary_path, node_name=node_name)
+        # only support a subset of telemetry for Docker hosts (specifically, we do not allow users to enable any devices)
+        node_telemetry = [
+            telemetry.DiskIo(self.metrics_store),
+            telemetry.CpuUsage(self.metrics_store),
+            telemetry.EnvironmentInfo(es, self.metrics_store)
+        ]
+        t = telemetry.Telemetry(devices=node_telemetry)
+        return cluster.Node(p, host["host"], node_name, t)
+
     def _start_process(self, cmd, node_name):
         startup_event = threading.Event()
         p = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.DEVNULL)
@@ -152,10 +163,12 @@ def start(self, car=None, binary=None, data_paths=None):
 
         # cannot enable custom telemetry devices here
         t = telemetry.Telemetry(devices=[
+            telemetry.ClusterMetaDataInfo(es),
             telemetry.ExternalEnvironmentInfo(es, self.metrics_store),
             telemetry.NodeStats(es, self.metrics_store),
             telemetry.IndexStats(es, self.metrics_store)
         ])
+        # cluster nodes will be populated by the external environment info telemetry device. We cannot know this upfront.
         c = cluster.Cluster(hosts, [], t)
         user_defined_version = self.cfg.opts("mechanic", "distribution.version", mandatory=False)
         distribution_version = es.info()["version"]["number"]
@@ -170,7 +183,7 @@ def start(self, car=None, binary=None, data_paths=None):
         return c
 
     def stop(self, cluster):
-        pass
+        cluster.telemetry.detach_from_cluster(cluster)
 
 
 class InProcessLauncher:
@@ -225,10 +238,11 @@ def start(self, car, binary, data_paths):
 
         logger.info("Starting a cluster based on car [%s] with [%d] nodes." % (car, car.nodes))
 
-        # TODO dm: Get rid of these...
+        # TODO dm: Get rid of this config setting and replace it with a proper list
         enabled_devices = self.cfg.opts("mechanic", "telemetry.devices")
 
         cluster_telemetry = [
+            telemetry.ClusterMetaDataInfo(es),
             # TODO dm: Once we do distributed launching, this needs to be done per node not per cluster
             telemetry.MergeParts(self.metrics_store, self.node_log_dir),
             telemetry.EnvironmentInfo(es, self.metrics_store),

diff --git a/esrally/mechanic/mechanic.py b/esrally/mechanic/mechanic.py
@@ -15,11 +15,33 @@
 ##########
 
 class ClusterMetaInfo:
-    def __init__(self, hosts, revision, distribution_version):
-        self.hosts = hosts
+    def __init__(self, nodes, revision, distribution_version):
+        self.nodes = nodes
         self.revision = revision
         self.distribution_version = distribution_version
 
+    def as_dict(self):
+        return {
+            "nodes": [n.as_dict() for n in self.nodes],
+            "revision": self.revision,
+            "distribution-version": self.distribution_version
+        }
+
+
+class NodeMetaInfo:
+    def __init__(self, n):
+        self.host_name = n.host_name
+        self.node_name = n.node_name
+        self.ip = n.ip
+        self.os = n.os
+        self.jvm = n.jvm
+        self.cpu = n.cpu
+        self.memory = n.memory
+        self.fs = n.fs
+
+    def as_dict(self):
+        return self.__dict__
+
 
 class StartEngine:
     def __init__(self, cfg, open_metrics_context, cluster_settings, sources, build, distribution, external, docker, port=None):
@@ -245,7 +267,7 @@ def receiveMessage(self, msg, sender):
                                        msg.distribution, msg.external, msg.docker)
                 cluster = self.mechanic.start_engine()
                 self.send(sender, EngineStarted(
-                    ClusterMetaInfo(cluster.hosts, cluster.source_revision, cluster.distribution_version),
+                    ClusterMetaInfo([NodeMetaInfo(node) for node in cluster.nodes], cluster.source_revision, cluster.distribution_version),
                     self.metrics_store.meta_info))
             elif isinstance(msg, OnBenchmarkStart):
                 self.metrics_store.lap = msg.lap

diff --git a/esrally/mechanic/provisioner.py b/esrally/mechanic/provisioner.py
@@ -95,8 +95,7 @@ def _es_log_config(self):
         elif os.path.isfile(log4j2_properties_path):
             distribution_version = self._config.opts("mechanic", "distribution.version", mandatory=False)
             if versions.is_version_identifier(distribution_version):
-                major, _, _, _ = versions.components(distribution_version)
-                if major == 5:
+                if versions.major_version(distribution_version) == 5:
                     return "log4j2.properties.5", log4j2_properties_path
             else:
                 return "log4j2.properties", log4j2_properties_path
@@ -134,7 +133,7 @@ def number_of_nodes(self):
         distribution_version = self._config.opts("mechanic", "distribution.version", mandatory=False)
         configure = False
         if versions.is_version_identifier(distribution_version):
-            major, _, _, _ = versions.components(distribution_version)
+            major = versions.major_version(distribution_version)
             if major >= 2:
                 configure = True
         else: